Tags: ocr* + pdf*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
    2025-05-25 Tags: , , , , , by klotz
  2. This article details a method for converting PDFs to Markdown using a local LLM (Gemma 3 via Ollama), focusing on privacy and efficiency. It involves rendering PDF pages as images and then using the LLM for content extraction, even from scanned PDFs.
    2025-04-16 Tags: , , , , , , , , by klotz
  3. A toolkit for training language models to work with PDF documents in the wild, including prompting strategies, evaluation tools, filtering, finetuning code, and processing PDFs through finetuned models.
  4. Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.
    2024-11-01 Tags: , , , , , , , , , , by klotz
  5. apt install tesseract-ocr-fra
    2024-03-04 Tags: , , , , , by klotz
  6. 2016-06-23 Tags: , , , by klotz
  7. pdfocr adds an OCR text layer to scanned PDF files, allowing them to be searched. It currently depends on Ruby 1.8.7 or above, and uses ocropus, cuneiform, or tesseract for performing OCR.

    To use, run:

    pdfocr -i input.pdf -o output.pdf
    2015-02-19 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "ocr+pdf"

About - Propulsed by SemanticScuttle