MarkItDown is an open-source Python utility that simplifies converting diverse file formats into Markdown, designed to prepare data for LLMs and RAG systems. It handles various file types, preserves document structure, and integrates with LLMs for tasks like image description.
Docling is a powerful open-source library for document processing, supporting diverse formats and advanced PDF understanding, with seamless integrations with the gen AI ecosystem.