klotz: holmesgpt*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. STCLab's SRE team shares their experience building an AI-driven investigation pipeline to automate the triage of Kubernetes alerts. By utilizing HolmesGPT, they implemented a ReAct pattern that allows LLMs to autonomously select tools like Prometheus, Loki, and kubectl based on specific context. The core finding was that high-quality markdown runbooks containing exclusion rules were more critical for successful investigations than the underlying AI model itself.
    Key points:
    * Implementation of HolmesGPT using the ReAct agent pattern for autonomous troubleshooting.
    * Integration with Robusta to manage Slack routing, deduplication, and thread matching.
    * The vital role of runbooks in narrowing search spaces and reducing wasted tool calls.
    * Comparison between self-hosted models via KubeAI and managed API approaches.
    * Significant reduction in manual triage time from 20 minutes to under two minutes per investigation.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: holmesgpt

About - Propulsed by SemanticScuttle