This article details how Nubank built its own in-house logging platform to address issues of cost, scalability, and control over their logging infrastructure. Initially reliant on a vendor solution, they found costs rising unpredictably and experienced limitations in observability and data retention.
To solve this, Nubank divided the project into two major steps: **The Observability Stream** (ingestion and processing) and the **Query & Log Platform** (storage and querying).
* **Observability Stream:** Fluent Bit for data collection, a Data Buffer Service for micro-batching, and an in-house Filter & Process Service.
* **Query & Log Platform:** Trino as the query engine, AWS S3 for storage, and Parquet for data format.
The new platform currently ingests 1 trillion logs daily, stores 45 PB of searchable data with a 45-day retention, and handles almost 15,000 queries daily. Nubank reports the platform costs 50% less than comparable market solutions while providing them with greater control, scalability, and the ability to customize features. The project underscored Nubank's value of challenging the status quo and leveraging a combination of open-source and in-house development.
A guide to building a robust logging system in Python, covering structured logging, log levels, handlers, formatters, filters, and integrating logging with modern observability practices.
Articles on logging, tracing, and observability including Echopraxia, Blindsight, structured log analysis, and more.
This article provides an overview of OpenTelemetry, an open-source observability framework, and guides on integrating it with Go applications. It covers key concepts like logs, metrics, and traces, and demonstrates setting up a reusable telemetry package using OpenTelemetry in Go.
Use Callbacks to send Output Data to Posthog, Sentry, etc. LiteLLM provides input_callbacks, success_callbacks, and failure_callbacks to easily send data based on response status.