SemanticScuttle - klotz.me » klotz: retrieval-augmented generation+llm

klotz: retrieval-augmented generation* + llm*

FailSafe: AI-Powered Fact-Checking System

FailSafe is an open-source, modular framework designed to automate the verification of textual claims. It employs a multi-stage pipeline that integrates Large Language Models (LLMs) with retrieval-augmented generation (RAG) techniques.

2026-01-08 Tags: python, knowledge-graph, celery, fact-checking, rag, automateion, vverification, chrome, llm, agents, amin7410 by klotz

Building a 100% local MCP Client

This article details how to build a 100% local MCP (Model Context Protocol) client using LlamaIndex, Ollama, and LightningAI. It provides a code walkthrough and explanation of the process, including setting up an SQLite MCP server and a locally served LLM.

2026-01-04 Tags: mcp, llamaindex, ollama, llm, local, data science, python, sqlite, agent, rag, dailydoseofds by klotz

Smart Coding MCP

An extensible Model Context Protocol (MCP) server that provides intelligent semantic code search for AI assistants. Built with local AI models using Matryoshka Representation Learning (MRL) for flexible embedding dimensions.

2026-01-02 Tags: javascript, llm, mcp, ast, gemini, cursor, codex, rag, mrl, claude, local, antigravity by klotz

Hands-On AI Engineering

A curated repository of AI-powered applications and agentic systems showcasing practical use cases of Large Language Models (LLMs) from providers like Google, Anthropic, OpenAI, and self-hosted open-source models.

2026-01-02 Tags: llm, agents, rag, retrieval-augmented generation, python, github, sumanth077 by klotz

Prompt Engineering Guide

Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.

2026-01-02 Tags: prompt engineering, llm, agents, rag, language-model, github, dair-ai by klotz

Choosing the Right Chunking Strategy: A Comprehensive Guide to RAG Optimization

This article explores different chunking strategies for Retrieval-Augmented Generation (RAG) systems, comparing nine approaches using the agenticmemory library to improve retrieval accuracy and reduce hallucinations.

2025-12-22 Tags: llm, performance, rag, chunking, embedding, vector database, rag optimization by klotz

The State of MCP in 2025

A comprehensive overview of the current state of Multi-Concept Prompting (MCP), including advancements, challenges, and future directions.

2025-12-08 Tags: mcp, multi-concept prompting, ai, llm, large language models, prompt engineering, ai agents, context windows, retrieval augmented generation by klotz

The Architecture Behind Web Search in AI Chatbots

This article explores the architecture enabling AI chatbots to perform web searches, covering retrieval-augmented generation (RAG), vector databases, and the challenges of integrating search with LLMs.

2025-12-07 Tags: chat, web search, rag, retrieval-augmented generation, vector databases, llm, information retrieval, knowledge base by klotz

How to Compress Your Prompts and Reduce LLM Costs

This article explores how to use LLMLingua, a tool developed by Microsoft, to compress prompts for large language models, reducing costs and improving efficiency without retraining models.

2025-11-21 Tags: llm, prompt compression, llmlingua, cost reduction, token efficiency, ai optimization, rag, gpt-4, inference speed by klotz

Building a RAG System That Runs Completely Offline

A tutorial on building a private, offline Retrieval Augmented Generation (RAG) system using Ollama for embeddings and language generation, and FAISS for vector storage, ensuring data privacy and control.

1. **Document Loader:** Extracts text from various file formats (PDF, Markdown, HTML) while preserving metadata like source and page numbers for accurate citations.
2. **Text Chunker:** Splits documents into smaller text segments (chunks) to manage token limits and improve retrieval accuracy. It uses overlapping and sentence boundary detection to maintain context.
3. **Embedder:** Converts text chunks into numerical vectors (embeddings) using the `nomic-embed-text` model via Ollama, which runs locally without internet access.
4. **Vector Database:** Stores the embeddings using FAISS (Facebook AI Similarity Search) for fast similarity search. It uses cosine similarity for accurate retrieval and saves the database to disk for quick loading in future sessions.
5. **Large Language Model (LLM):** Generates answers using the `llama3.2` model via Ollama, also running locally. It takes the retrieved context and the user's question to produce a response with citations.
6. **RAG System Orchestrator:** Coordinates the entire workflow, managing the ingestion of documents (loading, chunking, embedding, storing) and the querying process (retrieving relevant chunks, generating answers).

2025-11-15 Tags: rag, self-hosted, llm, ollama, faiss, embeddings, vector database, hackernoon by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: retrieval-augmented generation* + llm*

Linked Tags

Related Tags