SemanticScuttle - klotz.me » Tags: observability

Tags: observability*

Observability refers to the ability to understand the internal state of a system by observing its output. It involves monitoring, logging, and tracing various other forms of data collection to gain insights into the system's behavior, performance, and health. In the context of cloud engineering, observability is crucial for maintaining the efficiency and reliability of distributed systems, as it helps identify and diagnose issues, optimize performance, and ensure security. Observability tools, such as Splunk, Honeycomb, and OpenTelemetry, are used to collect and analyze metrics, logs, and traces, enabling capacity planning, root cause analysis and incident response.

0 bookmark(s) - Sort by: Date ↓ / Title /

Logging, Tracing, and Observability in General

Articles on logging, tracing, and observability including Echopraxia, Blindsight, structured log analysis, and more.

2025-04-04 Tags: logging, tracing, observability, echopraxia, blindsight, structured log analysis, scala, java by klotz

Model Context Protocol (MCP): Integrating Azure OpenAI for enhanced tool integration

This article details the Model Context Protocol (MCP), a new approach to integrating Large Language Models (LLMs) like Azure OpenAI with tools. MCP focuses on structured data exchange to improve reliability, observability, and functionality, moving beyond simple text-in, text-out interactions. It aims to standardize how LLMs interact with tools, enhancing their ability to utilize those tools effectively.

2025-04-01 Tags: azure, openai, llm, model context protocol, mcp, tools, observability, reliability, agents by klotz

Monitoring Gen AI apps with NVIDIA GPUs

This Splunk Lantern article outlines the steps to monitor Gen AI applications with Splunk Observability Cloud, covering setup with OpenTelemetry, NVIDIA GPU metrics, Python instrumentation, and OpenLIT integration to monitor GenAI applications built with technologies like Python, LLMs (OpenAI's GPT-4o, Anthropic's Claude 3.5 Haiku, Meta’s Llama), NVIDIA GPUs, Langchain, and vector databases (Pinecone, Chroma) using Splunk Observability Cloud. It outlines a six-step process:

Access Splunk Observability Cloud: Sign up for a free trial if needed.
Deploy Splunk Distribution of OpenTelemetry Collector: Use a Helm chart to install the collector in Kubernetes.
Capture NVIDIA GPU Metrics: Utilize the NVIDIA GPU Operator and Prometheus receiver in the OpenTelemetry Collector.
Instrument Python Applications: Use the Splunk Distribution of OpenTelemetry Python agent for automatic instrumentation and enable Always On Profiling.
Enhance with OpenLIT: Install and initialize OpenLIT to capture detailed trace data, including LLM calls and interactions with vector databases (with options to disable PII capture).
Start Using the Data: Leverage the collected metrics and traces, including features like Tag Spotlight, to identify and resolve performance issues (example given: OpenAI rate limits).

The article emphasizes OpenTelemetry's role in GenAI observability and highlights how Splunk Observability Cloud facilitates monitoring these complex applications, providing insights into performance, cost, and potential bottlenecks. It also points to resources for help and further information on specific aspects of the process.

2025-03-27 Tags: splunk, llm, observability, opentelemetry, nvidia, gpus, python, openlit, kubernetes by klotz

observability data lake

2025-03-25 Tags: observability, data lake, production engineering by klotz

Standardizing the Language of Observability in OpenTelemetry

The article discusses the challenges faced due to differing observability tools and naming conventions, and how OpenTelemetry's standard naming schemas can streamline workflows and enhance interoperability.

2025-03-14 Tags: opentelemetry, observability, semantic conventions, otel, production engineering by klotz

Grafana Loki Introduces v3.4 with Standardized Storage and Unified Telemetry

Grafana Loki version 3.4 introduces enhancements such as standardized storage with Thanos, a sizing guidance page, merging of Promtail into Grafana Alloy, and support for out-of-order logs.

2025-03-14 Tags: grafana, loki, observability, logs, production engineering, storage by klotz

Get Started with KubeBlocks

This skill path by Bryce Yu guides users through the basics of managing databases on Kubernetes using KubeBlocks. It covers installation, deployment, upgrades, backup, observability, and auto-tuning of database clusters.

2025-02-23 Tags: kubeblocks, kubernetes, observability, prometheus, grafana, self-hosted, production engineering by klotz

Sawmills emerges from stealth to trim enterprise observability costs and provide telemetry data sovereignty

Sawmills AI has introduced a smart telemetry data management platform aimed at reducing costs and improving data quality for enterprise observability. By acting as a middleware layer that uses AI and ML to optimize telemetry data before it reaches vendors like Datadog and Splunk, Sawmills helps companies manage data efficiently, retain data sovereignty, and reduce unnecessary data processing costs.

2025-02-20 Tags: sawmills, llm, observability, telemetry, splunk, datadog, by leveraging leading large language models (llms) and machine learning techniques, sawmills can drastically cut down the volume of data sent to observability tools, offering substantial cost savings. the platform is built on the opentelemetry collector with additio, enabling better data governance, anomaly detection, . production engineering, otel, machine learning by klotz

OpenInference

OpenInference is a set of conventions and plugins that complements OpenTelemetry to enable tracing of AI applications, with native support from arize-phoenix and compatibility with other OpenTelemetry-compatible backends.

2025-02-08 Tags: openinference, opentelemetry, observability, tracing, ai, arize, python, javascript, llm, production by klotz

Arize Phoenix

Arize Phoenix is an open-source observability library for AI experimentation, evaluation, and troubleshooting, built by Arize AI.

2025-02-08 Tags: arize phoenix, ai, observability, experiments, evaluation, troubleshooting, visualization, opentelemetry, openinference, production engineering, data engineering by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: observability*

Linked Tags

Related Tags