This article discusses how to improve the performance of Pandas operations by using vectorization with NumPy. It highlights alternatives to the apply() method on larger dataframes and provides examples of using NumPy's lesser-known methods like where and select to handle complex if/then/else conditions efficiently.
The article explores 11 essential tips for leveraging the full potential of the Pandas library to boost productivity and streamline workflows in handling and analyzing complex datasets. It uses a real-world dataset from Kaggle's Airbnb listings to illustrate techniques such as chunked processing and parallel execution.
The article discusses the importance of fine-tuning machine learning models for optimal inference performance and explores popular tools like vLLM, TensorRT, ONNX Runtime, TorchServe, and DeepSpeed.
This article discusses the TCP TIME-WAIT state on Linux systems, explaining its purpose, potential problems, and various solutions to handle a large number of connections.
A program called rusage.com that reports richer resource usage statistics than the traditional UNIX time command, with examples and use cases for optimizing performance
Raspberry Pi engineers have improved SDRAM timings and other memory settings, leading to a 10-20% performance boost on the Pi 5. Further overclocking can result in up to a 32% speed improvement. The changes may be rolled out in a firmware update for Pi 4 and Pi 5 users.
The article discusses the challenges and components required to scale Retrieval Augmented Generation (RAG) from a Proof of Concept (POC) to production. It covers key issues such as performance, data management, risk, integration into workflows, and cost. It also outlines architectural components such as scalable vector databases, caching mechanisms, advanced search techniques, responsible AI layers, and API gateways needed for overcoming these challenges.
This repository contains scripts for benchmarking the performance of large language models (LLMs) served using vLLM.
A startup called Backprop has demonstrated that a single Nvidia RTX 3090 GPU, released in 2020, can handle serving a modest large language model (LLM) like Llama 3.1 8B to over 100 concurrent users with acceptable throughput. This suggests that expensive enterprise GPUs may not be necessary for scaling LLMs to a few thousand users.
A study investigating whether format restrictions like JSON or XML impact the performance of large language models (LLMs) in tasks like reasoning and domain knowledge comprehension.