SemanticScuttle - klotz.me » Tags: production engineering

Tags: production engineering*

Production Engineering focuses on the design, implementation, and management of systems and processes to ensure the efficient and reliable delivery of software and services in a production environment. It involves various aspects such as deploying, monitoring, and maintaining applications, managing infrastructure, and handling data pipelines. Production Engineering KPIs include Availability and Cost.

0 bookmark(s) - Sort by: Date ↓ / Title /

Nimbus: Flexible BareMetal Provisioning

This article describes PhonePe's development of Nimbus, a system for provisioning and managing bare metal servers. Nimbus uses a precooked-metadata approach for server setup, enhancing security, reliability, and efficiency. The system leverages a custom boot firmware, API, and provisioning agent to streamline server lifecycle management, improving consistency, predictability, and issue detection.

2024-07-26 Tags: phonepe, nimbus, production engineering, krishna prashant by klotz

Document Parsing Using Large Language Models — With Code

This article explores the use of large language models (LLMs) for document parsing, offering a more powerful and flexible alternative to traditional methods like regular expressions. It discusses the workflow involved in processing documents like research papers using LLMs, highlighting the benefits and advantages of this approach.

2024-07-25 Tags: document, pasring llm, regular expressions, data extraction, production engineering by klotz

Hallux.ai: LLM-Based CLI Tools for Production Engineers, SRE, and DevOps

Hallux.ai is a platform offering open-source, LLM-based CLI tools for Linux and MacOS. These tools aim to streamline operations, enhance productivity, and automate workflows for professionals in production engineering, SRE, and DevOps. They also improve Root Cause Analysis (RCA) capabilities and enable self-sufficiency.

2024-07-18 Tags: hallux.ai, llm, cli tools, productivity, automation, root cause analysis, linux, macos, production engineering, sre, devops by klotz

The Ultimate Guide to Finding Outliers in Your Time-Series Data (Part 3)

Outlier treatment is a necessary step in data analysis. This article, part 3 of a four-part series, eases the process and provides insights on effective methods and tools for outlier detection.

2024-07-16 Tags: time-series, data analysis, outlier detection, machine learning, production engineering, observability by klotz

For the Love of God, Stop Using CPU Limits on Kubernetes (Updated)

This article explains why CPU limits are considered harmful on Kubernetes. The author provides three analogies about Kubernetes CPU Limits and discusses the best practices for CPU limits and requests on Kubernetes.

2024-07-14 Tags: kubernetes, cpu limits, robusta.dev, production engineering by klotz

Tracking in Practice: Code, Data and ML Model

A guide to tracking in MLOps, covering code, data, and machine learning model tracking

2024-07-12 Tags: mlops, data engineering, production engineering by klotz

Plandex: A Reliable and Developer-Friendly AI Coding Agent in Your Terminal

Plandex is an AI coding agent designed to work directly in the terminal, capable of planning and completing large tasks that span many files and steps. It helps developers build new apps quickly, add features to existing codebases, write tests and scripts, understand code, and fix bugs.

2024-07-11 Tags: plandex, llm, terminal, developer productivity, code completion, version control, production engineering by klotz

Systems Engineering Learning Resources to Become an SRE

A collection of learning resources for those interested in becoming a Site Reliability Engineer (SRE) at Google, focusing on systems engineering best practices, non-abstract large system design, distributed systems, and reliable data processing.

2024-07-05 Tags: sre, production engineering, google by klotz

Validating Data in a Production Pipeline: The TFX Way

This article explains the importance of data validation in a machine learning pipeline and demonstrates how to use TensorFlow Data Validation (TFDV) to validate data. It covers the 5 stages of machine learning validation: generating statistics from training data, inferring schema from training data, generating statistics for evaluation data and comparing it with training data, identifying and fixing anomalies, and checking for drifts and data skew.

2024-06-22 Tags: machine learning, data validation, tensorflow data validation, tfx, data pipeline, production engineering, anomaly detection, data skew, drift detection by klotz

KitOps: Open Source MLOps Tool for AI Project Management

Kit is a free, open-source MLOps tool that simplifies AI project management by packaging models, datasets, code, and configurations into a standardized, versioned, and tamper-proof ModelKit. It enables collaboration, model traceability, and reproducibility, making it easier to hand off AI projects between data scientists, developers, and DevOps teams.

2024-06-22 Tags: ml, mlops, kitops, cicd, production engineering by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: production engineering*

Linked Tags

Related Tags