klotz: production engineering*

Production Engineering focuses on the design, implementation, and management of systems and processes to ensure the efficient and reliable delivery of software and services in a production environment. It involves various aspects such as deploying, monitoring, and maintaining applications, managing infrastructure, and handling data pipelines. Production Engineering KPIs include Availability and Cost.

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article describes PhonePe's development of Nimbus, a system for provisioning and managing bare metal servers. Nimbus uses a precooked-metadata approach for server setup, enhancing security, reliability, and efficiency. The system leverages a custom boot firmware, API, and provisioning agent to streamline server lifecycle management, improving consistency, predictability, and issue detection.
  2. This article explores the use of large language models (LLMs) for document parsing, offering a more powerful and flexible alternative to traditional methods like regular expressions. It discusses the workflow involved in processing documents like research papers using LLMs, highlighting the benefits and advantages of this approach.
  3. Hallux.ai is a platform offering open-source, LLM-based CLI tools for Linux and MacOS. These tools aim to streamline operations, enhance productivity, and automate workflows for professionals in production engineering, SRE, and DevOps. They also improve Root Cause Analysis (RCA) capabilities and enable self-sufficiency.
  4. Outlier treatment is a necessary step in data analysis. This article, part 3 of a four-part series, eases the process and provides insights on effective methods and tools for outlier detection.
  5. This article explains why CPU limits are considered harmful on Kubernetes. The author provides three analogies about Kubernetes CPU Limits and discusses the best practices for CPU limits and requests on Kubernetes.
  6. A guide to tracking in MLOps, covering code, data, and machine learning model tracking
  7. Plandex is an AI coding agent designed to work directly in the terminal, capable of planning and completing large tasks that span many files and steps. It helps developers build new apps quickly, add features to existing codebases, write tests and scripts, understand code, and fix bugs.
  8. A collection of learning resources for those interested in becoming a Site Reliability Engineer (SRE) at Google, focusing on systems engineering best practices, non-abstract large system design, distributed systems, and reliable data processing.
    2024-07-05 Tags: , , by klotz
  9. This article explains the importance of data validation in a machine learning pipeline and demonstrates how to use TensorFlow Data Validation (TFDV) to validate data. It covers the 5 stages of machine learning validation: generating statistics from training data, inferring schema from training data, generating statistics for evaluation data and comparing it with training data, identifying and fixing anomalies, and checking for drifts and data skew.
  10. Kit is a free, open-source MLOps tool that simplifies AI project management by packaging models, datasets, code, and configurations into a standardized, versioned, and tamper-proof ModelKit. It enables collaboration, model traceability, and reproducibility, making it easier to hand off AI projects between data scientists, developers, and DevOps teams.
    2024-06-22 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: production engineering

About - Propulsed by SemanticScuttle