Tags: observability*

Observability refers to the ability to understand the internal state of a system by observing its output. It involves monitoring, logging, and tracing various other forms of data collection to gain insights into the system's behavior, performance, and health. In the context of cloud engineering, observability is crucial for maintaining the efficiency and reliability of distributed systems, as it helps identify and diagnose issues, optimize performance, and ensure security. Observability tools, such as Splunk, Honeycomb, and OpenTelemetry, are used to collect and analyze metrics, logs, and traces, enabling capacity planning, root cause analysis and incident response.

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Articles on logging, tracing, and observability including Echopraxia, Blindsight, structured log analysis, and more.

  2. This article details the Model Context Protocol (MCP), a new approach to integrating Large Language Models (LLMs) like Azure OpenAI with tools. MCP focuses on structured data exchange to improve reliability, observability, and functionality, moving beyond simple text-in, text-out interactions. It aims to standardize how LLMs interact with tools, enhancing their ability to utilize those tools effectively.

  3. This Splunk Lantern article outlines the steps to monitor Gen AI applications with Splunk Observability Cloud, covering setup with OpenTelemetry, NVIDIA GPU metrics, Python instrumentation, and OpenLIT integration to monitor GenAI applications built with technologies like Python, LLMs (OpenAI's GPT-4o, Anthropic's Claude 3.5 Haiku, Meta’s Llama), NVIDIA GPUs, Langchain, and vector databases (Pinecone, Chroma) using Splunk Observability Cloud. It outlines a six-step process:

    1. Access Splunk Observability Cloud: Sign up for a free trial if needed.
    2. Deploy Splunk Distribution of OpenTelemetry Collector: Use a Helm chart to install the collector in Kubernetes.
    3. Capture NVIDIA GPU Metrics: Utilize the NVIDIA GPU Operator and Prometheus receiver in the OpenTelemetry Collector.
    4. Instrument Python Applications: Use the Splunk Distribution of OpenTelemetry Python agent for automatic instrumentation and enable Always On Profiling.
    5. Enhance with OpenLIT: Install and initialize OpenLIT to capture detailed trace data, including LLM calls and interactions with vector databases (with options to disable PII capture).
    6. Start Using the Data: Leverage the collected metrics and traces, including features like Tag Spotlight, to identify and resolve performance issues (example given: OpenAI rate limits).

    The article emphasizes OpenTelemetry's role in GenAI observability and highlights how Splunk Observability Cloud facilitates monitoring these complex applications, providing insights into performance, cost, and potential bottlenecks. It also points to resources for help and further information on specific aspects of the process.

  4. The article discusses the challenges faced due to differing observability tools and naming conventions, and how OpenTelemetry's standard naming schemas can streamline workflows and enhance interoperability.

  5. Grafana Loki version 3.4 introduces enhancements such as standardized storage with Thanos, a sizing guidance page, merging of Promtail into Grafana Alloy, and support for out-of-order logs.

  6. This skill path by Bryce Yu guides users through the basics of managing databases on Kubernetes using KubeBlocks. It covers installation, deployment, upgrades, backup, observability, and auto-tuning of database clusters.

  7. Sawmills AI has introduced a smart telemetry data management platform aimed at reducing costs and improving data quality for enterprise observability. By acting as a middleware layer that uses AI and ML to optimize telemetry data before it reaches vendors like Datadog and Splunk, Sawmills helps companies manage data efficiently, retain data sovereignty, and reduce unnecessary data processing costs.

  8. OpenInference is a set of conventions and plugins that complements OpenTelemetry to enable tracing of AI applications, with native support from arize-phoenix and compatibility with other OpenTelemetry-compatible backends.

  9. Arize Phoenix is an open-source observability library for AI experimentation, evaluation, and troubleshooting, built by Arize AI.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "observability"

About - Propulsed by SemanticScuttle