SemanticScuttle - klotz.me

Tags: ocr* + pdf*

0 bookmark(s) - Sort by: Date ↓ / Title /

Docling

Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

2025-05-25 Tags: document, pdf, ocr, github, ibm, conversion by klotz
From PDF to Markdown with Local LLMs — Fast, Private, and Free

This article details a method for converting PDFs to Markdown using a local LLM (Gemma 3 via Ollama), focusing on privacy and efficiency. It involves rendering PDF pages as images and then using the LLM for content extraction, even from scanned PDFs.

2025-04-16 Tags: pdf, markdown, llm, self-hosted, gemma, ollama, ocr, pymupdf, pillow by klotz
olmOCR: Toolkit for Training Language Models to Work with PDF Documents

A toolkit for training language models to work with PDF documents in the wild, including prompting strategies, evaluation tools, filtering, finetuning code, and processing PDFs through finetuned models.

2025-02-28 Tags: pdf, llm, pdf processing, olmocr, allenai, ocr, document management, document conversion by klotz
DS4SD / docling

Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.

2024-11-01 Tags: docling, ibm, document, parsing, markdown, json, pdf, docx, pptx, ocr, llm by klotz
Convert a scanned pdf to text with Linux command line using OCRmyPDF | by Chi Thuc Nguyen | Medium

apt install tesseract-ocr-fra

2024-03-04 Tags: ocr, pdf, text, cli, linux, mac by klotz
Online Searchable PDF creator

2016-06-23 Tags: pdf, ocr, document conversion, search by klotz
gkovacs/pdfocr · GitHub

pdfocr adds an OCR text layer to scanned PDF files, allowing them to be searched. It currently depends on Ruby 1.8.7 or above, and uses ocropus, cuneiform, or tesseract for performing OCR.

To use, run:

pdfocr -i input.pdf -o output.pdf

2015-02-19 Tags: pdf, ocr, github, ruby, tesseract by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0

About - Propulsed by SemanticScuttle

SemanticScuttle - klotz.me

Tags: ocr* + pdf*

Linked Tags

Related Tags