SemanticScuttle - klotz.me » Tags: transformers+quantization

Tags: transformers* + quantization*

0 bookmark(s) - Sort by: Date ↓ / Title /

DeepSeek-R1-0528-Qwen3-8B-GGUF

This page details the DeepSeek-R1-0528-Qwen3-8B model, a quantized version of DeepSeek-R1-0528, highlighting its improved reasoning capabilities, evaluation results, usage guidelines, and licensing information. It offers various quantization options (GGUF) for local execution.

2025-05-30 Tags: deepseek-r1, qwen3, gguf, llm, quantization, reasoning, text generation, transformers, model card, mcp, huggingface by klotz
Why Your RAG Embeddings Are Costing You a Fortune (And How I Fixed It)

This article details the often overlooked cost of storing embeddings for RAG systems, and how quantization techniques (int8 and binary) can significantly reduce storage requirements and improve retrieval speed without substantial accuracy loss.

2025-04-30 Tags: rag, embedding, vector database, transformers, llm, quantization by klotz
Text Generation Web UI

This document details how to run Qwen models locally using the Text Generation Web UI (oobabooga), covering installation, setup, and launching the web interface.

2025-04-08 Tags: alibaba, qwen, text generation web ui, oobabooga, llm, inference, llama.cpp, transformers, quantization, python by klotz

First / Previous / Next / Last / Page 1 of 0