SemanticScuttle - klotz.me » Tags: llm+quantization+inference

Tags: llm* + quantization* + inference*

0 bookmark(s) - Sort by: Date ↓ / Title /

This article details the performance of Unsloth Dynamic GGUFs on the Aider Polyglot benchmark, showcasing how it can quantize LLMs like DeepSeek-V3.1 to as low as 1-bit while outperforming models like GPT-4.5 and Claude-4-Opus. It also covers benchmark setup, comparisons to other quantization methods, and chat template bug fixes.

2025-10-13 Tags: unsloth, gguf, aider polyglot, llm, quantization, deepseek-v3.1, gpt-4, claude-4, model compression, fine-tuning, inference by klotz

Text Generation Web UI

This document details how to run Qwen models locally using the Text Generation Web UI (oobabooga), covering installation, setup, and launching the web interface.

2025-04-08 Tags: alibaba, qwen, text generation web ui, oobabooga, llm, inference, llama.cpp, transformers, quantization, python by klotz

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.

2024-09-14 Tags: gguf, quantization, llm, cpu, inference, imatrix by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: llm* + quantization* + inference*

Linked Tags

Related Tags