klotz: github* + pdf*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats and provides advanced PDF understanding, metadata extraction, and integration with LlamaIndex and LangChain for RAG / QA applications.
    2024-11-01 Tags: , , , , , , , , , by klotz
  2. IncarnaMind enables chatting with personal documents (PDF, TXT) using Large Language Models (LLMs) like GPT. It uses a Sliding Window Chunking mechanism and Ensemble Retriever for efficient querying.
    2024-08-09 Tags: , , , by klotz
  3. The llmsherpa project provides APIs to accelerate Large Language Model (LLM) projects. It includes features like LayoutPDFReader for PDF text parsing, smart chunking for vector search and Retrieval Augmented Generation, and table analysis. It is open-sourced under Apache 2.0 license.
  4. 2023-06-25 Tags: , , , , , , , by klotz
  5. pdfocr adds an OCR text layer to scanned PDF files, allowing them to be searched. It currently depends on Ruby 1.8.7 or above, and uses ocropus, cuneiform, or tesseract for performing OCR.

    To use, run:

    pdfocr -i input.pdf -o output.pdf
    2015-02-19 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: github + pdf

About - Propulsed by SemanticScuttle