SemanticScuttle - klotz.me

Tags: faiss*

0 bookmark(s) - Sort by: Date ↓ / Title /

Building a RAG System That Runs Completely Offline

A tutorial on building a private, offline Retrieval Augmented Generation (RAG) system using Ollama for embeddings and language generation, and FAISS for vector storage, ensuring data privacy and control.

1. **Document Loader:** Extracts text from various file formats (PDF, Markdown, HTML) while preserving metadata like source and page numbers for accurate citations.
2. **Text Chunker:** Splits documents into smaller text segments (chunks) to manage token limits and improve retrieval accuracy. It uses overlapping and sentence boundary detection to maintain context.
3. **Embedder:** Converts text chunks into numerical vectors (embeddings) using the `nomic-embed-text` model via Ollama, which runs locally without internet access.
4. **Vector Database:** Stores the embeddings using FAISS (Facebook AI Similarity Search) for fast similarity search. It uses cosine similarity for accurate retrieval and saves the database to disk for quick loading in future sessions.
5. **Large Language Model (LLM):** Generates answers using the `llama3.2` model via Ollama, also running locally. It takes the retrieved context and the user's question to produce a response with citations.
6. **RAG System Orchestrator:** Coordinates the entire workflow, managing the ingestion of documents (loading, chunking, embedding, storing) and the querying process (retrieving relevant chunks, generating answers).

2025-11-15 Tags: rag, self-hosted, llm, ollama, faiss, embeddings, vector database, hackernoon by klotz

Enhancing GPU-Accelerated Vector Search in Faiss with NVIDIA cuVS

This post explores how to solve challenges in vector search using NVIDIA cuVS with the Meta Faiss library. It covers the benefits of integration, performance improvements, benchmarks, and code examples.

2025-11-07 Tags: vector search, faiss, nvidia cuvs, gpu acceleration, ivf, cagra, rag, large language models, machine learning by klotz

Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain

Scaling a simple RAG pipeline from simple notes to full books. This post elaborates on how to utilize larger files with your RAG pipeline by adding an extra step to the process — chunking.

2025-08-20 Tags: rag, openai, langchain, llm, vector database, faiss, chunking, medium by klotz

Implementing semantic cache to improve a RAG system with FAISS

In this notebook, we will explore a typical RAG solution where we will utilize an open-source model and the vector database Chroma DB. However, we will integrate a semantic cache system that will store various user queries and decide whether to generate the prompt enriched with information from the vector database or the cache.

2024-03-12 Tags: llm, rag, chromadb, faiss, cache by klotz

Vector Databases - Basics of Vector Search and Langchain Package in Python | HackerNoon

2023-10-01 Tags: search, langchain, faiss, similarity, chromadb, qdrant, milvus by klotz

QA using a Retriever |

2023-08-17 Tags: langchain, document, search, llm, chat, faiss, chromadb, vector database by klotz

Running Llama 2 on CPU Inference Locally for Document Q&A | by Kenneth Leung | Jul, 2023 | Towards Data Science

2023-08-17 Tags: document, search, llm, chat, faiss, vector database, llama, llama 2 by klotz

First / Previous / Next / Last / Page 1 of 0