klotz: quantization* + machine learning*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A deep dive into the process of LLM inference, covering tokenization, transformer architecture, KV caching, and optimization techniques for efficient text generation.
  2. An in-depth look at the architecture of OpenAI's GPT-OSS models, detailing tokenization, embeddings, transformer blocks, Mixture of Experts, attention mechanisms (GQA and RoPE), and quantization techniques.
  3. This article details the often overlooked cost of storing embeddings for RAG systems, and how quantization techniques (int8 and binary) can significantly reduce storage requirements and improve retrieval speed without substantial accuracy loss.
  4. Introducing sqlite-vec, a new SQLite extension for vector search written entirely in C. It's a stable release and can be installed in multiple ways. It runs on various platforms, is fast, and supports quantization techniques for efficient storage and search.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: quantization + machine learning

About - Propulsed by SemanticScuttle