This article details how to accelerate deep learning and LLM inference using Apache Spark, focusing on distributed inference strategies. It covers basic deployment with `predict_batch_udf`, advanced deployment with inference servers like NVIDIA Triton and vLLM, and deployment on cloud platforms like Databricks and Dataproc. It also provides guidance on resource management and configuration for optimal performance.
Walkthrough on building a Q and A pipeline using various tools, and distributing it with ModelKits for collaboration.
This article explores the transformer architecture behind Llama 3, a large language model released by Meta, and discusses how to leverage its power for enterprise and grassroots level use. It also delves into the technical details of LlaMA 3 and its prospects for the GenAI ecosystem.
This article provides an introduction to Mlflow, an open-source platform for end-to-end machine learning lifecycle management. The article focuses on using MLflow as an orchestrator for machine learning pipelines, explaining the importance of managing complex pipelines in machine learning projects.