"Prove AI is a self-hosted solution designed to accelerate GenAI performance monitoring. It allows AI engineers to capture, customize, and monitor GenAI metrics on their own terms, without vendor lock-in. Built on OpenTelemetry, Prove AI connects to existing OpenTelemetry pipelines and surfaces meaningful metrics quickly.
Key features include a unified web-based interface for consolidating performance metrics like token throughput, latency distributions, and service health. It enables faster debugging, improved time-to-metric, and better measurement of GenAI ROI. The platform is open-source, free to deploy, and offers full control over telemetry data."
Distributed tracing is crucial for modern observability, offering richer context than logs. However, the volume of tracing data can be overwhelming. Sampling addresses this by selectively retaining data, with two main approaches: head sampling (deciding upfront) and tail sampling (deciding after collecting all spans). Head sampling is simpler but can miss localized issues. Tail sampling, while more accurate, is complex to implement at scale, requiring buffering, stateful processing, and potentially impacting system resilience. Furthermore, sampling inherently affects the accuracy of RED metrics (request rate, error rate, duration), necessitating metric materialization *before* sampling.
Logs, metrics, and traces aren't enough. AI apps require visibility into prompts and completions to track everything from security risks to hallucinations.
Cisco and Splunk have introduced the Cisco Time Series Model, a univariate zero shot time series foundation model designed for observability and security metrics. It is released as an open weight checkpoint on Hugging Face.
* **Multiresolution data is common:** The model handles data where fine-grained (e.g., 1-minute) and coarse-grained (e.g., hourly) data coexist, a typical pattern in observability platforms where older data is often aggregated.
* **Long context windows are needed:** It's built to leverage longer historical data (up to 16384 points) than many existing time series models, improving forecasting accuracy.
* **Zero-shot forecasting is desired:** The model aims to provide accurate forecasts *without* requiring task-specific fine-tuning, making it readily applicable to a variety of time series datasets.
* **Quantile forecasting is important:** It predicts not just the mean forecast but also a range of quantiles (0.1 to 0.9), providing a measure of uncertainty.
The company's transition from fragmented observability tools to a unified system using OpenTelemetry and OneUptime dramatically improved incident response times, reducing MTTR from 41 to 9 minutes. By correlating logs, metrics, and traces through structured logging and intelligent sampling, they eliminated much of the noise and confusion that previously slowed root cause analysis. The shift also reduced the number of dashboards engineers needed to check per incident and significantly lowered the percentage of incidents with unknown causes.
Key practices included instrumenting once with OpenTelemetry, enforcing cardinality limits, and archiving raw data for future analysis. The move away from 100% trace capture and over-instrumentation helped manage data volume while maintaining visibility into anomalies. This transformation emphasized that effective observability isn't about collecting more data, but about designing correlated signals that support intentional diagnosis and reduce cognitive load.
This article provides an overview of OpenTelemetry, an open-source observability framework, and guides on integrating it with Go applications. It covers key concepts like logs, metrics, and traces, and demonstrates setting up a reusable telemetry package using OpenTelemetry in Go.
OpenTelemetry is not just an observability platform, it's a set of best practices and standards that can be integrated into platform engineering or DevOps.
This article explores various metrics used to evaluate the performance of classification machine learning models, including precision, recall, F1-score, accuracy, and alert rate. It explains how these metrics are calculated and provides insights into their application in real-world scenarios, particularly in fraud detection.
This article discusses the importance of understanding and memorizing classification metrics in machine learning. The author shares their own experience and strategies for memorizing metrics such as accuracy, precision, recall, F1 score, and ROC AUC.
The article explains how to apply Friedman's h-statistic to understand if complex machine learning models use interactions to make predictions. It uses the artemis package and interprets the pairwise, overall, and unnormalised metrics.