SemanticScuttle - klotz.me » Tags: retrieval-augmented generation+embeddings

Tags: retrieval-augmented generation* + embeddings*

0 bookmark(s) - Sort by: Date ↓ / Title /

Understanding Context and Contextual Retrieval in RAG

RAG combines language models with external knowledge. This article explores context & retrieval in RAG, covering search methods (keywords, TF-IDF, embeddings/FAISS/Chroma), context length challenges (compression, re-ranking), and contextual retrieval (query & conversation history).

2026-03-08 Tags: rag, retrieval-augmented generation, context, contextual retrieval, semantic search, embeddings, faiss, chroma, llm, large language models, knowledge retrieval by klotz

Amazon S3 Vectors now generally available with increased scale and performance

Amazon S3 Vectors is now generally available with increased scale and production-grade performance capabilities. It offers native support to store and query vector data, potentially reducing costs by up to 90% compared to specialized vector databases.

2025-12-08 Tags: s3 vectors, vector database, ai, machine learning, embeddings, rag, amazon bedrock, amazon opensearch, cloud storage, aws by klotz

Building a RAG System That Runs Completely Offline

A tutorial on building a private, offline Retrieval Augmented Generation (RAG) system using Ollama for embeddings and language generation, and FAISS for vector storage, ensuring data privacy and control.

1. **Document Loader:** Extracts text from various file formats (PDF, Markdown, HTML) while preserving metadata like source and page numbers for accurate citations.
2. **Text Chunker:** Splits documents into smaller text segments (chunks) to manage token limits and improve retrieval accuracy. It uses overlapping and sentence boundary detection to maintain context.
3. **Embedder:** Converts text chunks into numerical vectors (embeddings) using the `nomic-embed-text` model via Ollama, which runs locally without internet access.
4. **Vector Database:** Stores the embeddings using FAISS (Facebook AI Similarity Search) for fast similarity search. It uses cosine similarity for accurate retrieval and saves the database to disk for quick loading in future sessions.
5. **Large Language Model (LLM):** Generates answers using the `llama3.2` model via Ollama, also running locally. It takes the retrieved context and the user's question to produce a response with citations.
6. **RAG System Orchestrator:** Coordinates the entire workflow, managing the ingestion of documents (loading, chunking, embedding, storing) and the querying process (retrieving relevant chunks, generating answers).

2025-11-15 Tags: rag, self-hosted, llm, ollama, faiss, embeddings, vector database, hackernoon by klotz

How I Built Lightning-Fast Vector Search for Legal Documents

This article details the process of building a fast vector search system for a large legal dataset (Australian High Court decisions). It covers choosing embedding providers, performance benchmarks, using USearch and Isaacus embeddings, and the importance of API terms of service. It focuses on achieving speed and scalability while maintaining reasonable accuracy.

2025-10-21 Tags: vector search, embeddings, legal documents, usearch, isaacus, performance, scalability, nlp, information retrieval, rag by klotz

A VectorDB Doesn’t Actually Work the Way You Think It Does

This article explains the internal workings of vector databases, highlighting that they don't perform a brute-force search as commonly described. It details algorithms like HNSW, IVF, and PQ, the tradeoffs between recall, speed, and memory, and how different RAG patterns impact vector database usage. It also discusses production challenges like filtering, updates, and sharding.

2025-10-03 Tags: vector database, vector search, hnsw, ivf, pq, rag, approximate nearest neighbor, ai, embeddings, semantic search by klotz

Google DeepMind Finds a Fundamental Bug in RAG: Embedding Limits Break Retrieval at Scale

Google DeepMind research reveals a fundamental architectural limitation in Retrieval-Augmented Generation (RAG) systems related to fixed-size embeddings. The research demonstrates that retrieval performance degrades as database size increases, with theoretical limits based on embedding dimensionality. They introduce the LIMIT benchmark to empirically test these limitations and suggest alternatives like cross-encoders, multi-vector models, and sparse models.

2025-09-05 Tags: rag, retrieval-augmented generation, embeddings, google deepmind, limit benchmark, ai, machine learning, sparse models, cross-encoders, multi-vector models by klotz

Automatic Embeddings in Postgres

This article details how to automate embedding generation and updates in Postgres using Supabase Vector, Queues, Cron, and pg_net extension with Edge Functions, addressing the issues of drift, latency, and complexity found in traditional external embedding pipelines.

2025-04-02 Tags: supabase, postgres, embeddings, semantic search, rag, pgvector, edge functions, database by klotz

SQLite RAG Tutorial

A simple project demonstrating Retrieval Augmented Generation (RAG) using SQLite, sqlite-vec, and OpenAI. It embeds text files, stores them in a SQLite database, and retrieves relevant documents using vector search. The project features lightweight single-file SQLite databases, vector search capabilities, and OpenAI integration for embeddings and chat responses.

2025-02-20 Tags: sqlite, rag, sqlite-vec, vector search, embeddings, llm, github, edizaguirre by klotz

Discovering Semantic Search and RAG with Large Language Models (LLMs)

Foundational concepts, practical implementation of semantic search, and the workflow of RAG, highlighting its advantages and versatile applications.

The article provides a step-by-step guide to implementing a basic semantic search using TF-IDF and cosine similarity. This includes preprocessing steps, converting text to embeddings, and searching for relevant documents based on query similarity.

2024-10-04 Tags: llm, semantic search, rag, nlp, embeddings, asymmetric by klotz

Advanced RAG Techniques

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.

2024-08-01 Tags: rag, nlp, machine learning, information retrieval, natural language processing, llm, embeddings, semantic search by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: retrieval-augmented generation* + embeddings*

Linked Tags

Related Tags