SemanticScuttle - klotz.me » Tags: llm+inference

Tags: llm* + inference*

0 bookmark(s) - Sort by: Date ↓ / Title /

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.

2024-09-14 Tags: gguf, quantization, llm, cpu, inference, imatrix by klotz

Inference.net: Wholesaler of LLM Inference Tokens

Inference.net offers LLM inference tokens for models like Llama 3.1 at a 50-90% discount from other providers. They aggregate unused compute resources from data centers to offer fast, reliable, and affordable inference services.

inference.net is a wholesaler of LLM inference tokens for models like Llama 3.1. We provide inference services at a 50-90% discount from what you would pay together.ai or groq.

'We sell tokens in 10 billion token increments. The current cost per 10 billion tokens for an 8B model is $200."

2024-08-14 Tags: llm, inference, saas, ecommerce by klotz

Serverless AI Inference with Gemma 2 using Mozilla's llamafile on AWS Lambda

The author explores the use of Gemma 2 and Mozilla's llamafile on AWS Lambda for serverless AI inference

2024-07-08 Tags: gemma 2, llamafile, aws, lambda, llm, inference, mozilla by klotz

Best LLM Inference Engines and Servers to Deploy LLMs in Production

Explore the best LLM inference engines and servers available to deploy and serve LLMs in production, including vLLM, TensorRT-LLM, Triton Inference Server, RayLLM with RayServe, and HuggingFace Text Generation Inference.

2024-06-21 Tags: llm, inference, production engineering by klotz

Podman AI Lab

Podman AI Lab is the easiest way to work with Large Language Models (LLMs) on your local developer workstation. It provides a catalog of recipes, a curated list of open source models, experiment and compare the models, get ahead of the curve and take your development to new heights wth Podman AI Lab!

2024-05-11 Tags: llm, inference, server, podman by klotz

Fast inference engine | Nitro

2023-12-29 Tags: llm, inference, 3b, model, openai by klotz

Mastering LLM Techniques: Inference Optimization

2023-11-18 Tags: llm, inference, performance, optimization, nvidia by klotz

LLM Inference Performance Metrics

2023-10-13 Tags: llm, inference, performance, metrics by klotz

llama-2 on cpu inference for document q-and a

2023-07-22 Tags: llama-2, llm, cpu, inference, document, q-and a, langchain by klotz

shawwn/llama: Inference code for LLaMA models

2023-06-05 Tags: llm, llama, github, inference, python by klotz