Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats and provides advanced PDF understanding, metadata extraction, and integration with LlamaIndex and LangChain for RAG / QA applications.
The llmsherpa project provides APIs to accelerate Large Language Model (LLM) projects. It includes features like LayoutPDFReader for PDF text parsing, smart chunking for vector search and Retrieval Augmented Generation, and table analysis. It is open-sourced under Apache 2.0 license.