IBM has introduced Granite 4.0 3B Vision, a specialized vision-language model (VLM) engineered for high-fidelity enterprise document data extraction. Unlike monolithic multimodal models, this release uses a modular LoRA adapter architecture, adding approximately 0.5B parameters to the Granite 4.0 Micro base model. This design allows for efficient dual-mode deployment, activating vision capabilities only when multimodal processing is required. The model excels at converting complex visual elements, such as charts and tables, into structured machine-readable formats like JSON, HTML, and CSV. By utilizing a high-resolution tiling mechanism and a DeepStack architecture for improved spatial alignment, Granite 4.0 3B Vision achieves impressive accuracy in tasks like Key-Value Pair extraction and chart reasoning, ranking highly on industry benchmarks.
A symposium dedicated to Invisible XML (ixml), a language and process for identifying structure in documents. The event will be held online on February 26/27, 2026, and attendance is free. The schedule includes presentations on various aspects of ixml implementation, syntax, and applications.
Rensa is a high-performance MinHash suite written in Rust with Python bindings. It's designed for efficient similarity estimation and deduplication of large datasets. It offers R-MinHash, C-MinHash, and OptDensMinHash variants, significantly faster than datasketch while maintaining comparable accuracy.
Docling simplifies document processing, parsing diverse formats โ including advanced PDF understanding โ and providing seamless integrations with the gen AI ecosystem.
This tutorial demonstrates how to build a powerful document search engine using Hugging Face embeddings, Chroma DB, and Langchain for semantic search capabilities.
This article introduces the pyramid search approach using Agentic Knowledge Distillation to address the limitations of traditional RAG strategies in document ingestion.
The pyramid structure allows for multi-level retrieval, including atomic insights, concepts, abstracts, and recollections. This structure mimics a knowledge graph but uses natural language, making it more efficient for LLMs to interact with.
**Knowledge Distillation Process**:
- **Conversion to Markdown**: Documents are converted to Markdown for better token efficiency and processing.
- **Atomic Insights Extraction**: Each page is processed using a two-page sliding window to generate a list of insights in simple sentences.
- **Concept Distillation**: Higher-level concepts are identified from the insights to reduce noise and preserve essential information.
- **Abstract Creation**: An LLM writes a comprehensive abstract for each document, capturing dense information efficiently.
- **Recollections/Memories**: Critical information useful across all tasks is stored at the top of the pyramid.
A detailed guide on how to install DumbPad on your Synology NAS using Docker & Portainer. Learn how to set up and customize DumbPad for note-taking with auto-save and dark mode support.
Send once, read anywhere. Convert and send documents to your Kindle library or specific devices. Supported file types include PDF, DOCX, TXT, etc., with a max file size of 200 MB.
MarkItDown is a utility for converting various files to Markdown, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, text-based formats, and ZIP files.
MegaParse is an open-source tool designed for parsing and converting various types of documents for ingestion into LLM. It supports multiple document formats, including text, PDF, PowerPoint, Excel, CSV, and Word documents, and offers customizable output formats to meet different LLM requirements, making it a versatile and efficient solution for data preparation in LLM applications.