SemanticScuttle - klotz.me » Tags: quantization+gguf

Tags: quantization* + gguf*

0 bookmark(s) - Sort by: Date ↓ / Title /

This page details the DeepSeek-R1-0528-Qwen3-8B model, a quantized version of DeepSeek-R1-0528, highlighting its improved reasoning capabilities, evaluation results, usage guidelines, and licensing information. It offers various quantization options (GGUF) for local execution.

2025-05-30 Tags: deepseek-r1, qwen3, gguf, llm, quantization, reasoning, text generation, transformers, model card, mcp, huggingface by klotz

Server approved! 4xH100 (320gb vram). Looking for advice

A user is seeking advice on deploying a new server with 4x H100 GPUs (320GB VRAM) for on-premise AI workloads. They are considering a Kubernetes-based deployment with RKE2, Nvidia GPU Operator, and tools like vLLM, llama.cpp, and Litellm. They are also exploring the option of GPU pass-through with a hypervisor. The post details their current infrastructure and asks for potential gotchas or best practices.

2025-04-28 Tags: h100, kubernetes, vllm, llama.cpp, gpu, ai, deployment, rke2, litellm, quantization, sxm, fp8, awq, gguf, production engineering, inference engineering, scale, reddit, localllama by klotz

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.

2024-09-14 Tags: gguf, quantization, llm, cpu, inference, imatrix by klotz

Artifacts Quantized LLM Inference Performance Results on 70b+ Models

This document contains the quantized LLM inference performance results on 70b+ models.

2024-06-23 Tags: artifacts, quantization, llm, gguf by klotz

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Exploring Pre-Quantized Large Language Models

2023-11-15 Tags: llm, quantization, gguf by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: quantization* + gguf*

Linked Tags

Related Tags