klotz: vector database*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article explores different chunking strategies for Retrieval-Augmented Generation (RAG) systems, comparing nine approaches using the agenticmemory library to improve retrieval accuracy and reduce hallucinations.
  2. Amazon S3 Vectors is now generally available with increased scale and production-grade performance capabilities. It offers native support to store and query vector data, potentially reducing costs by up to 90% compared to specialized vector databases.
  3. A tutorial on building a private, offline Retrieval Augmented Generation (RAG) system using Ollama for embeddings and language generation, and FAISS for vector storage, ensuring data privacy and control.

    1. **Document Loader:** Extracts text from various file formats (PDF, Markdown, HTML) while preserving metadata like source and page numbers for accurate citations.
    2. **Text Chunker:** Splits documents into smaller text segments (chunks) to manage token limits and improve retrieval accuracy. It uses overlapping and sentence boundary detection to maintain context.
    3. **Embedder:** Converts text chunks into numerical vectors (embeddings) using the `nomic-embed-text` model via Ollama, which runs locally without internet access.
    4. **Vector Database:** Stores the embeddings using FAISS (Facebook AI Similarity Search) for fast similarity search. It uses cosine similarity for accurate retrieval and saves the database to disk for quick loading in future sessions.
    5. **Large Language Model (LLM):** Generates answers using the `llama3.2` model via Ollama, also running locally. It takes the retrieved context and the user's question to produce a response with citations.
    6. **RAG System Orchestrator:** Coordinates the entire workflow, managing the ingestion of documents (loading, chunking, embedding, storing) and the querying process (retrieving relevant chunks, generating answers).
  4. This article explains the internal workings of vector databases, highlighting that they don't perform a brute-force search as commonly described. It details algorithms like HNSW, IVF, and PQ, the tradeoffs between recall, speed, and memory, and how different RAG patterns impact vector database usage. It also discusses production challenges like filtering, updates, and sharding.
  5. Sparse Priming Representations (SPR) is a research project focused on developing and sharing techniques for efficiently representing complex ideas, memories, or concepts using a minimal set of keywords, phrases, or statements, enabling language models or subject matter experts to quickly reconstruct the original idea with minimal context.
  6. Scaling a simple RAG pipeline from simple notes to full books. This post elaborates on how to utilize larger files with your RAG pipeline by adding an extra step to the process — chunking.
  7. Unlock the power of 301 redirects at scale LLMs to enhance user experience and optimize your website's SEO strategy.
    2025-05-31 Tags: , , , , by klotz
  8. This article details the often overlooked cost of storing embeddings for RAG systems, and how quantization techniques (int8 and binary) can significantly reduce storage requirements and improve retrieval speed without substantial accuracy loss.
  9. This article details building a Retrieval-Augmented Generation (RAG) system to assist with research paper tasks, specifically question answering over a PDF document. It covers document loading, splitting, embedding with Sentence Transformers, using ChromaDB as a vector database, and implementing a query interface with LangChain.
  10. This tutorial demonstrates how to build a powerful document search engine using Hugging Face embeddings, Chroma DB, and Langchain for semantic search capabilities.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: vector database

About - Propulsed by SemanticScuttle