0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag
A toolkit for training language models to work with PDF documents in the wild, including prompting strategies, evaluation tools, filtering, finetuning code, and processing PDFs through finetuned models.
The article discusses the process of preparing PDFs for use in Retrieval-Augmented Generation (RAG) systems, with a focus on creating graph-based RAGs from annual reports containing tables. It highlights the benefits of Graph RAGs over vector store-backed RAGs, particularly in terms of reasoning capabilities, and explores the construction of knowledge graphs for better information retrieval. The author shares insights into the challenges and solutions involved in building an enterprise-ready graph data store for RAG applications.
MarkItDown is a utility for converting various files to Markdown, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, text-based formats, and ZIP files.
How to read and convert PDFs to Markdown for better RAG results with LLMs.
Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats and provides advanced PDF understanding, metadata extraction, and integration with LlamaIndex and LangChain for RAG / QA applications.
Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.
A guided series of tutorials/notebooks to build a PDF to Podcast workflow using Llama models for text processing, transcript writing, dramatization, and text-to-speech conversion.
This blog post explores scaling ColPali for efficient document retrieval across large collections of PDFs using Vespa's phased retrieval and ranking pipeline, including the use of a hamming-based MaxSim similarity function.
IncarnaMind enables chatting with personal documents (PDF, TXT) using Large Language Models (LLMs) like GPT. It uses a Sliding Window Chunking mechanism and Ensemble Retriever for efficient querying.
First / Previous / Next / Last
/ Page 1 of 0