This article discusses common issues with Retrieval-Augmented Generation (RAG) systems, such as context blindness and first-person confusion, and provides solutions to improve retrieval accuracy in local LLMs.
This article introduces the pyramid search approach using Agentic Knowledge Distillation to address the limitations of traditional RAG strategies in document ingestion.
The pyramid structure allows for multi-level retrieval, including atomic insights, concepts, abstracts, and recollections. This structure mimics a knowledge graph but uses natural language, making it more efficient for LLMs to interact with.
**Knowledge Distillation Process**:
- **Conversion to Markdown**: Documents are converted to Markdown for better token efficiency and processing.
- **Atomic Insights Extraction**: Each page is processed using a two-page sliding window to generate a list of insights in simple sentences.
- **Concept Distillation**: Higher-level concepts are identified from the insights to reduce noise and preserve essential information.
- **Abstract Creation**: An LLM writes a comprehensive abstract for each document, capturing dense information efficiently.
- **Recollections/Memories**: Critical information useful across all tasks is stored at the top of the pyramid.
BackToIt is a comprehensive bookmarking app designed to streamline the way you manage and organize web links. It allows you to save, organize, and share bookmarks with ease, using just two clicks, and offers features like full-text search, reading time estimates, and customizable tags. The app is accessible across devices, ensuring your data is always at your fingertips, and it emphasizes security by avoiding ads and spam.
LLM-powered bookmark search engine that allows you to search from your local browser bookmarks using natural language.
Self-hosted collaborative bookmark manager to collect, organize, and preserve webpages, articles, and more...
Web application that summarizes online content, automatically categorizes and interlinks it for easy rediscovery. Save time and build your knowledge base with Recall.
Issue on GitHub about supporting targeting elements on a page via a querySelector string, then checking whether the returned element(s) textContent matches a regex string.