A from-scratch reimplementation of Stanford's XTR-Warp semantic search engine written in safe Rust. It is designed for client-side deployment, utilizing a single-file SQLite database for storage without the need for external API keys, vector databases, or complex chunking strategies. The engine offers high performance with extremely low end-to-end search latency and supports hybrid search by combining semantic results with standard BM25 functionality.
Key features and components:
- High-speed semantic search capable of running on local devices.
- SQLite backend for easy data persistence and portability.
- Support for various backends including T5 quantized weights via candle and OpenVINO.
- Pickbrain CLI example for indexing AI coding session transcripts (Claude Code/OpenAI Codex).
- Hardware acceleration support for Apple Silicon (Metal) and x86 (fbgemm).
- Available as a Node.js native module.
Claude-Mem is a persistent memory compression system designed specifically for Claude Code and Gemini CLI. It automatically captures tool usage observations, generates semantic summaries via AI, and injects relevant context into future sessions to ensure continuity of knowledge across coding projects.
Key features include:
* Persistent memory that survives session restarts
* Progressive disclosure architecture for token-efficient retrieval
* Skill-based search using MCP tools (search, timeline, get_observations)
* Hybrid semantic and keyword search powered by Chroma vector database and SQLite
* Privacy controls via specific tags to exclude sensitive data
* A web viewer UI for real-time memory stream monitoring
RAG combines language models with external knowledge. This article explores context & retrieval in RAG, covering search methods (keywords, TF-IDF, embeddings/FAISS/Chroma), context length challenges (compression, re-ranking), and contextual retrieval (query & conversation history).
Learn how to build a simple semantic search engine using sentence embeddings and nearest neighbors, focusing on the limitations of keyword-based search and leveraging large language models for semantic understanding.
This article compares the performance of LLM embeddings, TF-IDF, and Bag of Words for text vectorization and information retrieval tasks using scikit-learn. It provides a practical comparison with code examples and discusses the strengths and weaknesses of each approach.
This article explains the internal workings of vector databases, highlighting that they don't perform a brute-force search as commonly described. It details algorithms like HNSW, IVF, and PQ, the tradeoffs between recall, speed, and memory, and how different RAG patterns impact vector database usage. It also discusses production challenges like filtering, updates, and sharding.
LocalAI is a free and open-source AI stack that allows you to run language models, autonomous agents, and document intelligence locally on your hardware. It's an OpenAI API-compatible alternative focused on privacy, ease of use, and extensibility.
The article explores whether combining a command-line agent (like Claude Code or Gemini CLI) with Unix-like file system tools and SemTools is sufficient for complex tasks, particularly document search. It details a benchmark testing the limits of coding agents with and without SemTools, focusing on search, cross-referencing, and temporal analysis. The conclusion is that CLI access is powerful and SemTools enhances agent capabilities for document search and RAG.
Semantic search and document parsing tools for the command line. A collection of high-performance CLI tools for document processing and semantic search, built with Rust for speed and reliability.
Ryan speaks with Edo Liberty, Founder and CEO of Pinecone, about building vector databases, the power of embeddings, the evolution of RAG, and fine-tuning AI models.