klotz: rag*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article explores how to use LLMLingua, a tool developed by Microsoft, to compress prompts for large language models, reducing costs and improving efficiency without retraining models.
  2. A tutorial on building a private, offline Retrieval Augmented Generation (RAG) system using Ollama for embeddings and language generation, and FAISS for vector storage, ensuring data privacy and control.

    1. **Document Loader:** Extracts text from various file formats (PDF, Markdown, HTML) while preserving metadata like source and page numbers for accurate citations.
    2. **Text Chunker:** Splits documents into smaller text segments (chunks) to manage token limits and improve retrieval accuracy. It uses overlapping and sentence boundary detection to maintain context.
    3. **Embedder:** Converts text chunks into numerical vectors (embeddings) using the `nomic-embed-text` model via Ollama, which runs locally without internet access.
    4. **Vector Database:** Stores the embeddings using FAISS (Facebook AI Similarity Search) for fast similarity search. It uses cosine similarity for accurate retrieval and saves the database to disk for quick loading in future sessions.
    5. **Large Language Model (LLM):** Generates answers using the `llama3.2` model via Ollama, also running locally. It takes the retrieved context and the user's question to produce a response with citations.
    6. **RAG System Orchestrator:** Coordinates the entire workflow, managing the ingestion of documents (loading, chunking, embedding, storing) and the querying process (retrieving relevant chunks, generating answers).
  3. This post explores how to solve challenges in vector search using NVIDIA cuVS with the Meta Faiss library. It covers the benefits of integration, performance improvements, benchmarks, and code examples.
  4. This paper addresses the misalignment between traditional IR evaluation metrics and the requirements of modern Retrieval-Augmented Generation (RAG) systems. It proposes a novel annotation schema and the UDCG metric to better evaluate retrieval quality for LLM consumers.
  5. This article details the process of building a fast vector search system for a large legal dataset (Australian High Court decisions). It covers choosing embedding providers, performance benchmarks, using USearch and Isaacus embeddings, and the importance of API terms of service. It focuses on achieving speed and scalability while maintaining reasonable accuracy.
  6. IBM is releasing Granite-Docling-258M, an ultra-compact and cutting-edge open-source vision-language model (VLM) for converting documents to machine-readable formats while preserving layout, tables, equations, and more. It's designed for accurate and efficient document conversion and excels beyond simple text extraction.
  7. Plural is bringing AI into the DevOps lifecycle with a new release that leverages a unified GitOps platform as a RAG engine. This provides AI-powered troubleshooting, natural language infrastructure querying, autonomous upgrade assistance, and agentic workflows for infrastructure modification, all with enterprise-grade guardrails.
  8. This article explains the internal workings of vector databases, highlighting that they don't perform a brute-force search as commonly described. It details algorithms like HNSW, IVF, and PQ, the tradeoffs between recall, speed, and memory, and how different RAG patterns impact vector database usage. It also discusses production challenges like filtering, updates, and sharding.
  9. A curated collection of Awesome LLM apps built with RAG, AI Agents, Multi-agent Teams, MCP, Voice Agents, and more. This repository features LLM apps that use models from OpenAI, Anthropic, Google, xAI and open-source models like Qwen or Llama.
  10. The article explores whether combining a command-line agent (like Claude Code or Gemini CLI) with Unix-like file system tools and SemTools is sufficient for complex tasks, particularly document search. It details a benchmark testing the limits of coding agents with and without SemTools, focusing on search, cross-referencing, and temporal analysis. The conclusion is that CLI access is powerful and SemTools enhances agent capabilities for document search and RAG.

Top of the page

First / Previous / Next / Last / Page 5 of 0 SemanticScuttle - klotz.me: Tags: rag

About - Propulsed by SemanticScuttle