A tutorial on building a private, offline Retrieval Augmented Generation (RAG) system using Ollama for embeddings and language generation, and FAISS for vector storage, ensuring data privacy and control.
1. **Document Loader:** Extracts text from various file formats (PDF, Markdown, HTML) while preserving metadata like source and page numbers for accurate citations.
2. **Text Chunker:** Splits documents into smaller text segments (chunks) to manage token limits and improve retrieval accuracy. It uses overlapping and sentence boundary detection to maintain context.
3. **Embedder:** Converts text chunks into numerical vectors (embeddings) using the `nomic-embed-text` model via Ollama, which runs locally without internet access.
4. **Vector Database:** Stores the embeddings using FAISS (Facebook AI Similarity Search) for fast similarity search. It uses cosine similarity for accurate retrieval and saves the database to disk for quick loading in future sessions.
5. **Large Language Model (LLM):** Generates answers using the `llama3.2` model via Ollama, also running locally. It takes the retrieved context and the user's question to produce a response with citations.
6. **RAG System Orchestrator:** Coordinates the entire workflow, managing the ingestion of documents (loading, chunking, embedding, storing) and the querying process (retrieving relevant chunks, generating answers).
This tutorial provides a gentle introduction to building a modern text recognition system using deep learning in just 15 minutes. It teaches the main ideas of using Keras and Supervisely for this problem, and is suitable for anyone interested in using deep learning for text recognition in images but doesn't know where to start. The tutorial focuses on a real-world example: number plate recognition. The authors also share their understanding of creating effective recognition systems in two short video lectures, available on YouTube.