SemanticScuttle - klotz.me » klotz: quantization+gguf

klotz: quantization* + gguf*

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.

2024-09-14 Tags: gguf, quantization, llm, cpu, inference, imatrix by klotz
Artifacts Quantized LLM Inference Performance Results on 70b+ Models

This document contains the quantized LLM inference performance results on 70b+ models.

2024-06-23 Tags: artifacts, quantization, llm, gguf by klotz
Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Exploring Pre-Quantized Large Language Models

2023-11-15 Tags: llm, quantization, gguf by klotz

First / Previous / Next / Last / Page 1 of 0