SemanticScuttle - klotz.me » klotz: inference+cpu

klotz: inference* + cpu*

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.

2024-09-14 Tags: gguf, quantization, llm, cpu, inference, imatrix by klotz
llama-2 on cpu inference for document q-and a

2023-07-22 Tags: llama-2, llm, cpu, inference, document, q-and a, langchain by klotz

First / Previous / Next / Last / Page 1 of 0