Virgil.Dev is a tool that parses GitHub repositories into structured code graphs, extracting crucial elements like functions, classes, imports, and cross-file references across ten programming languages. It differs from traditional text-based search by providing exact structural results from an indexed code graph, enabling faster and more accurate code understanding. Users can explore their code via the Model Context Protocol (MCP), an AI chat interface with built-in tools, or a dedicated CLI for local parsing and querying. Pricing tiers range from free to developer plans.
Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats and provides advanced PDF understanding, metadata extraction, and integration with LlamaIndex and LangChain for RAG / QA applications.
The llmsherpa project provides APIs to accelerate Large Language Model (LLM) projects. It includes features like LayoutPDFReader for PDF text parsing, smart chunking for vector search and Retrieval Augmented Generation, and table analysis. It is open-sourced under Apache 2.0 license.