SemanticScuttle - klotz.me » Tags: performance+production engineering

Tags: performance* + production engineering*

0 bookmark(s) - Sort by: Date ↓ / Title /

Scaling RAG from POC to Production

The article discusses the challenges and components required to scale Retrieval Augmented Generation (RAG) from a Proof of Concept (POC) to production. It covers key issues such as performance, data management, risk, integration into workflows, and cost. It also outlines architectural components such as scalable vector databases, caching mechanisms, advanced search techniques, responsible AI layers, and API gateways needed for overcoming these challenges.

2024-10-08 Tags: rag, performance, production engineering by klotz
Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

A startup called Backprop has demonstrated that a single Nvidia RTX 3090 GPU, released in 2020, can handle serving a modest large language model (LLM) like Llama 3.1 8B to over 100 concurrent users with acceptable throughput. This suggests that expensive enterprise GPUs may not be necessary for scaling LLMs to a few thousand users.

2024-08-24 Tags: nvidia, rtx 3090, llm, gpu, performance, benchmark, llama 3.1 8b, vllm, production engineering, backprop.co by klotz
Types of commands - Splunk Documentation

Distributable streaming

2023-02-03 Tags: splunk, execution model, performance, data engineering, production engineering by klotz
Improve Splunk Performance and Lower CPU Usage - Cribl

2021-07-14 Tags: cribl, splunk, performance, logs, metrics, production engineering by klotz
Running Apache Kafka over Istio - benchmark · Banzai Cloud

2019-07-29 Tags: istio, kafal, performance, service mesh, kubernetes, operator, banzaicloud, production engineering by klotz
Benchmarking Istio & Linkerd CPU – Michael Kipper – Medium

Overall, though, the Istio/Envoy proxies use ~50% more CPU than Linkerd.

2019-06-19 Tags: linkerd, istio, supergloo, service mesh, kubernetes, performance, production engineering by klotz
Performance Benchmark Analysis of Istio and Linkerd | Kinvolk

2019-06-19 Tags: linkerd, istio, service mesh, performance, kubernetes, production engineering by klotz
Zero Downtime AWS ALB Deployments - AppOptics Blog

2019-05-31 Tags: aws, alb, load balancer, performance, deployment, production engineering by klotz
Meshery - a multi-service mesh performance benchmark and playground | Layer5

2019-05-20 Tags: meshery, service mesh, kubernetes, lee calcote, performance, benchmark, istio, linkerd, production engineering by klotz
[D] Things To Avoid When Running Tensorflow in Docker on Kubernetes

2018-09-15 Tags: kubernetes, tensorflow, performance, production engineering by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0

About - Propulsed by SemanticScuttle