klotz: quantization* + gguf*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.
    2024-09-14 Tags: , , , , , by klotz
  2. This document contains the quantized LLM inference performance results on 70b+ models.
    2024-06-23 Tags: , , , by klotz
  3. Exploring Pre-Quantized Large Language Models
    2023-11-15 Tags: , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: quantization + gguf

About - Propulsed by SemanticScuttle