Nvidia’s NeMo Retriever models and RAG pipeline make quick work of ingesting PDFs and generating reports based on them. Chalk one up for the plan-reflect-refine architecture.
Sparse Priming Representations (SPR) is a research project focused on developing and sharing techniques for efficiently representing complex ideas, memories, or concepts using a minimal set of keywords, phrases, or statements, enabling language models or subject matter experts to quickly reconstruct the original idea with minimal context.
Scaling a simple RAG pipeline from simple notes to full books. This post elaborates on how to utilize larger files with your RAG pipeline by adding an extra step to the process — chunking.
An end-to-end raw text-to-graph pipelines. This blog explores the limitations of LangChain extraction when using smaller quantized models, and how BAML can improve extraction success rates.
This article details 10 open-source AI tools for developers, covering their benefits, features, and use cases. It emphasizes transparency, offline capabilities, and community support as key advantages of open-source AI.
| **Tool Name** | **Description** | **Key Features** | **What I Like About It** |
|---|---|---|---|
| **Talkd.ai** | Prototyping AI Agents | No-code, JSON/YAML config, API integration | Fast prototyping, no backend needed |
| **Marimo** | Python Notebooks for Apps | Reactive cells, version control, UI widgets | Stable, shareable, version-controlled apps |
| **Unsloth AI** | LLM Fine-Tuning | Memory-optimized training, supports Llama 3 | Accessible fine-tuning on modest hardware |
| **HackingBuddyGPT** | AI for Ethical Hacking | Offline operation, recon tools, payload generation | Offline security, privacy |
| **Giskard** | AI Testing & Debugging | Test case creation, continuous monitoring | Engineering discipline for AI quality |
| **OpenWebUI** | Self-Hosted ChatGPT UI | Local LLMs, plugin support, persistent memory | Privacy, local control |
| **Axolotl** | LLM Fine-Tuning | YAML config, supports QLORA/PEFT/LORA | Simplified fine-tuning, reproducibility |
| **FastRAG** | RAG Pipeline | Local operation, fast query times | Quick, lightweight RAG setup |
| **Nav2** | Robot Navigation Framework | Real-time obstacle detection, multi-robot coordination | Flexible, modern ROS 2 integration |
| **MindsDB** | Machine Learning in Database | SQL-based training/inference, supports various DBs | Easy integration with existing SQL workflows |
This article discusses the importance of knowledge graphs in providing context for AI agents, highlighting their advantages over traditional retrieval systems in terms of precision, reasoning, and explainability.
MarkItDown is an open-source Python utility that simplifies converting diverse file formats into Markdown, designed to prepare data for LLMs and RAG systems. It handles various file types, preserves document structure, and integrates with LLMs for tasks like image description.
IBM announces Granite 3.3, featuring a new speech-to-text model (Granite Speech 3.3 8B), enhanced reasoning capabilities in Granite 3.3 8B Instruct, and RAG-focused LoRA adapters for Granite 3.2. The release also includes activated LoRAs (aLoRAs) for improved efficiency and all models are open source.
This article details the often overlooked cost of storing embeddings for RAG systems, and how quantization techniques (int8 and binary) can significantly reduce storage requirements and improve retrieval speed without substantial accuracy loss.
This article details building a Retrieval-Augmented Generation (RAG) system to assist with research paper tasks, specifically question answering over a PDF document. It covers document loading, splitting, embedding with Sentence Transformers, using ChromaDB as a vector database, and implementing a query interface with LangChain.