Tags: retrieval-augmented generation*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Claude-Mem is a persistent memory compression system designed specifically for Claude Code and Gemini CLI. It automatically captures tool usage observations, generates semantic summaries via AI, and injects relevant context into future sessions to ensure continuity of knowledge across coding projects.
    Key features include:
    * Persistent memory that survives session restarts
    * Progressive disclosure architecture for token-efficient retrieval
    * Skill-based search using MCP tools (search, timeline, get_observations)
    * Hybrid semantic and keyword search powered by Chroma vector database and SQLite
    * Privacy controls via specific tags to exclude sensitive data
    * A web viewer UI for real-time memory stream monitoring
  2. graphify is an AI coding assistant skill that transforms codebases, documents, and images into a structured, queryable knowledge graph. By utilizing deterministic AST parsing via tree-sitter for code and multimodal LLM capabilities for unstructured data like PDFs and screenshots, it creates a comprehensive map of concepts and relationships. This allows developers to understand complex architectures faster and find the "why" behind design decisions. A key advantage is its massive reduction in token usage per query compared to reading raw files, making it highly efficient for large-scale projects. The tool supports 19 programming languages and integrates seamlessly with platforms like Claude Code and Codex, providing an interactive, persistent, and highly organized way to navigate any codebase or research corpus.
  3. This paper introduces Meta-Harness, an innovative outer-loop system designed to automate the optimization of model harnesses for large language model (LLM) applications. While traditional harnesses are largely designed by hand, Meta-Harness employs an agentic proposer that searches over harness code by accessing source code, scores, and execution traces. The researchers demonstrate significant performance gains across multiple domains: improving text classification efficiency, enhancing accuracy in retrieval-augmented math reasoning for IMO-level problems, and surpassing hand-engineered baselines in agentic coding tasks. The results suggest that providing automated systems with richer access to prior experience can successfully enable the automated engineering of complex LLM harnesses.
  4. * **Naive RAG:** Uses simple vector similarity for direct, fact-based queries.
    * **Multimodal RAG:** Retrieves information across various formats, including text, images, and audio.
    * **HyDE (Hypothetical Document Embeddings):** Generates a "fake" answer first to improve the retrieval of real documents.
    * **Corrective RAG:** Verifies retrieved data against trusted sources to ensure accuracy.
    * **Graph RAG:** Utilizes knowledge graphs to capture complex relationships between entities.
    * **Hybrid RAG:** Combines vector-based retrieval with graph-based methods for richer context.
    * **Adaptive RAG:** Dynamically switches between simple retrieval and complex reasoning based on the query.
    * **Agentic RAG:** Employs AI agents to manage complex workflows involving multiple tools and sources.
  5. Dimension Reducers builds tools to formalize, stress-test, verify, and structure mathematical knowledge. They offer solutions for LLM training, automated refereeing, and retrieval that understands mathematical structure. Their platform includes tools for refereeing at scale, adversarial testing ("torture testing"), and structured Retrieval Augmented Generation (RAG).
    Key products include DiRe-JAX (a dimensionality reduction library), arXiv Math Semantic Search, arXiv Proof Audit Database, Mathematics Torture Chamber, and a Lean 4 Formalization Pipeline. They also publish research and benchmarks in mathematical formalization and OCR, emphasizing semantic accuracy and robustness.
  6. 1. **Retrieval-Augmented Generation (RAG):** Ground responses in trusted, retrieved data instead of relying on the model's memory.
    2. **Require Citations:** Demand sources for factual claims; retract claims without support.
    3. **Tool Calling:** Use LLMs to route requests to verified systems of record (databases, APIs) rather than generating facts directly.
    4. **Post-Generation Verification:** Employ a "judge" model to evaluate and score responses for factual accuracy, regenerating or refusing low-scoring outputs. Chain-of-Verification (CoVe) is highlighted.
    5. **Bias Toward Quoting:** Prioritize direct quotes over paraphrasing to reduce factual drift.
    6. **Calibrate Uncertainty:** Design for safe failure by incorporating confidence scoring, thresholds, and fallback responses.
    7. **Continuous Evaluation & Monitoring:** Track hallucination rates and other key metrics to identify and address performance degradation. User feedback loops are critical.
  7. This article details building end-to-end observability for LLM applications using FastAPI and OpenTelemetry. It emphasizes a code-first approach, manually designing traces, spans, and semantic attributes to capture the full lifecycle of LLM-powered requests. The guide advocates for a structured approach to tracing RAG workflows, focusing on clear span boundaries, safe metadata capture (hashing prompts/responses), token usage tracking, and integration with observability backends like Jaeger, Grafana Tempo, or specialized LLM platforms. It highlights the importance of understanding LLM behavior beyond traditional infrastructure metrics.
  8. RAG combines language models with external knowledge. This article explores context & retrieval in RAG, covering search methods (keywords, TF-IDF, embeddings/FAISS/Chroma), context length challenges (compression, re-ranking), and contextual retrieval (query & conversation history).
  9. This article discusses how to effectively utilize Large Language Models (LLMs) by acknowledging their superior processing capabilities and adapting prompting techniques. It emphasizes the importance of brevity, directness, and providing relevant context (through RAG and MCP servers) to maximize LLM performance. The article also highlights the need to treat LLM responses as drafts and use Socratic prompting for refinement, while acknowledging their potential for "hallucinations." It suggests formatting output expectations (JSON, Markdown) and utilizing role-playing to guide the LLM towards desired results. Ultimately, the author argues that LLMs, while not inherently "smarter" in a human sense, possess vast knowledge and can be incredibly powerful tools when approached strategically.
  10. Adafruit highlights the development of “pycoClaw,” a fully-featured AI agent implemented in MicroPython and running on a $5 ESP32-S3. This agent boasts capabilities like recursive tool calling, persistent memory using SD card storage, and a touchscreen UI, all built with an async architecture and optimized for performance through C user modules. The project is open-source and supports various hardware platforms, with ongoing development for RP2350, and is showcased alongside other Adafruit news including new product releases, community events, and resources for makers.
    2026-03-06 Tags: , , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "retrieval-augmented generation"

About - Propulsed by SemanticScuttle