SemanticScuttle - klotz.me » Tags: large language models+retrieval-augmented generation

Tags: large language models* + retrieval-augmented generation*

0 bookmark(s) - Sort by: Date ↓ / Title /

Hybrid Search and Re-Ranking in Production RAG

This article explores techniques for optimizing Retrieval-Augmented Generation (RAG) systems by implementing hybrid search and re-ranking mechanisms. It details how to combine dense vector embeddings with sparse keyword matching, such as BM25, to improve retrieval accuracy, followed by the use of a cross-encoder reranker to ensure only the most relevant context is passed to a Large Language Model in production environments.

2026-05-13 Tags: rag, hybrid search, re-ranking, semantic search, vector database, llm by klotz

Memori

Memori is an agent-native memory infrastructure that acts as an LLM-agnostic layer to transform AI agent execution and conversations into structured, persistent state for production systems. It integrates seamlessly into existing architectures, allowing agents to automatically capture and recall information from past interactions without requiring changes to core code or prompts.
Key features and points:
* Provides advanced augmentation of memories including attributes, facts, preferences, relationships, and skills at the entity, process, and session levels.
* Achieves high accuracy and token efficiency in long-conversation memory as demonstrated by LoCoMo benchmark results.
* Offers dedicated SDKs for both Python and TypeScript.
* Supports Model Context Protocol (MCP) for easy connection to developer tools like Claude Code and Cursor.
* Compatible with a wide range of LLMs including OpenAI, Anthropic, Gemini, DeepSeek, and Grok, as well as frameworks like LangChain and Pydantic AI.

2026-05-11 Tags: python, agent, typescript, state-management, ai memory, llm, rag, memori-ai, mcp by klotz

temporal-rag

A post-retrieval temporal layer designed to improve RAG systems by addressing time-blindness in vector searches. This library implements validity filtering, document kind classification, and exponential decay scoring to ensure retrieved information is fresh and accurate. It functions downstream of existing vector search systems without requiring re-indexing or new infrastructure.

2026-05-11 Tags: python, nlp, information retrieval, knowledge base, freshness, reranking, rag, time decay, llm, temporal, search, time, emmimal p alexander, github, emmimal by klotz

RAG Is Blind to Time — I Built a Temporal Layer to Fix It in Production

>"How I added temporal awareness and freshness tracking to a RAG system with no sense of time."

2026-05-11 Tags: rag, search, llm, time, emmimal p alexander by klotz

The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next

Pinecone is pivoting from traditional RAG toward a new "knowledge engine" called Nexus designed specifically for the needs of agentic AI. By moving reasoning work from inference time to a pre-query compilation stage, Nexus creates persistent, task-specific knowledge artifacts that significantly reduce token costs and improve reliability for autonomous agents.

**Technical Details:**
* **Context Compiler:** Transforms raw enterprise data into structured, reusable "knowledge artifacts" optimized for specific agent roles (e.g., sales or finance) to prevent redundant re-discovery during every session.
* **KnowQL:** A new declarative query language that allows agents to specify intent, output shape, confidence requirements, and latency budgets using six core primitives.
* **Composable Retriever:** Provides typed fields, per-field citations with confidence levels, and deterministic conflict resolution to ensure auditability and structured outputs.
* **Efficiency Gains:** Pinecone’s internal benchmarks demonstrated a 98% reduction in token usage for specific financial analysis tasks by utilizing pre-compiled context rather than raw document retrieval.

2026-05-05 Tags: sean michael kerner, rag, llm, knowledge base, pinecone, nexus, agentic ai, knowledge compilation, knowql by klotz

How to Build an LLM Knowledge Base

This article explores a practical approach to building an LLM knowledge base by treating the model as a compiler rather than just a retrieval tool. Instead of relying solely on complex RAG systems and vector databases, the author proposes a structured workflow that transforms raw source material into a durable, organized wiki. This method focuses on creating lasting value through repeatable processes like indexing, compiling paper pages, developing concept maps, and filing query answers back into the system to create a continuous feedback loop.
Main points:
- Moving beyond traditional RAG toward an LLM-driven compilation workflow.
- Implementing a structured folder hierarchy including raw, wiki, derived, and prompts directories.
- The importance of creating concept pages that connect multiple sources rather than just summarizing individual papers.
- Establishing a feedback loop where query answers are saved back into the knowledge base.
- Using maintenance passes to ensure the system remains updated and cohesive.

2026-04-29 Tags: llm, knowledge base, rag, ai agents, workflow automation, information architecture, boxy by klotz

OpenKB — Open LLM Knowledge Base

OpenKB is an open-source command-line system designed to transform raw documents into a structured, interlinked wiki-style knowledge base using Large Language Models. Unlike traditional RAG systems that rediscover information with every query, OpenKB compiles knowledge once into a persistent format where summaries, concept pages, and cross-references are automatically maintained and updated.
Key features and capabilities include:
- Vectorless long document retrieval powered by PageIndex tree indexing.
- Native multi-modality for understanding figures, tables, and images.
- Broad format support including PDF, Word, Markdown, PowerPoint, HTML, and Excel.
- Automated wiki compilation that creates summaries and synthesizes concepts across documents.
- Interactive chat sessions with persisted history and Obsidian compatibility via wikilinks.
- Health check tools (linting) to identify contradictions, gaps, or stale content within the knowledge base.

2026-04-27 Tags: llm, retrieval, knowledge base, agents, rag, open source, pageindex, github, vectifyai, openkb by klotz

A Coding Implementation on Microsoft’s Phi-4-Mini for Quantized Inference Reasoning Tool Use RAG and LoRA Fine-Tuning

This tutorial provides a comprehensive coding walkthrough for building an advanced AI pipeline using Microsoft's Phi-4-mini language model. The guide demonstrates how to leverage this compact model for high-performance tasks within resource-constrained environments like Google Colab.
Key topics covered include:
- Setting up 4-bit quantized inference to optimize GPU memory usage.
- Implementing streaming chat and multi-step chain-of-thought reasoning.
- Executing native tool calling and function calling for agentic interactions.
- Building a retrieval-augmented generation (RAG) pipeline using FAISS and sentence transformers.
- Performing lightweight LoRA fine-tuning to inject new knowledge into the model.

2026-04-26 Tags: microsoft phi-4-mini, quantized inference, llm tutorial, rag, lora fine-tuning, tool use, chain-of-thought reasoning, small language models, llm, hallux by klotz

Welcome to Prove AI

Prove AI is developing an observability-first foundation designed for production generative AI systems. Their mission is to enable engineering teams to understand, diagnose, and remediate failures within complex AI pipelines, including LLM inference, retrieval processes, and agent orchestration.
The current release, v0.1, provides an opinionated observability pipeline specifically for generative AI workloads through:
- A containerized, OpenTelemetry-based telemetry pipeline.
- Preconfigured collection of traces, metrics, and logs tailored for AI systems.
- Instrumentation patterns for RAG pipelines, embeddings, LLM inference, and agent-based systems.
- Compatibility with standard backends like Prometheus.

2026-04-14 Tags: llm, observability, production engineering, opentelemetry, inference, rag, telemetry by klotz

Claude-Mem

Claude-Mem is a persistent memory compression system designed specifically for Claude Code and Gemini CLI. It automatically captures tool usage observations, generates semantic summaries via AI, and injects relevant context into future sessions to ensure continuity of knowledge across coding projects.
Key features include:
* Persistent memory that survives session restarts
* Progressive disclosure architecture for token-efficient retrieval
* Skill-based search using MCP tools (search, timeline, get_observations)
* Hybrid semantic and keyword search powered by Chroma vector database and SQLite
* Privacy controls via specific tags to exclude sensitive data
* A web viewer UI for real-time memory stream monitoring

2026-04-13 Tags: llm, agents, claude code, persistent memory, rag, mcp tools, context management, semantic search by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: large language models* + retrieval-augmented generation*

Linked Tags

Related Tags