Tags: ocr* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format.
  2. | **Model** | **Parameters (B)** | **Main Strength** | **Special Capabilities** | **Best Use Case** |
    |----------------------|--------------------|------------------------------|-------------------------------------------------------|---------------------------------------------------|
    | olmOCR-2-7B-1025 | 7 | High-accuracy document OCR | GRPO RL training, equation/table OCR | Large-scale document pipelines, technical PDFs |
    | PaddleOCR v5/VL | 1 | Multilingual parsing (109 langs) | Text, tables, formulas, charts, dynamic visual encoder | Global multilingual OCR, efficient inference |
    | OCRFlux-3B | 3 | Markdown-accurate parsing | Cross-page merging, vLLM optimization | PDF-to-Markdown, consumer GPU friendly |
    | MiniCPM-V 4.5 | 8 | State-of-the-art multimodal OCR| Video OCR, high-resolution images, fast/deep modes | Mobile/edge OCR, video understanding |
    | InternVL 2.5-4B | 4 | Efficient OCR & reasoning | Dynamic tiling, strong text extraction | Resource-limited environments, multi-image/video |
    | Granite Vision 3.3 2b| 2 | Visual document understanding| Charts, tables, diagrams, segmentation, multi-page QA| Enterprise document extraction |
    | TrOCR Large Printed | 0.6 | Clean printed-text OCR | 16x16 patch encoder, BEiT/RoBERTa | Simple, high-quality printed text extraction |
    2025-12-27 Tags: , , by klotz
  3. This article details a method for converting PDFs to Markdown using a local LLM (Gemma 3 via Ollama), focusing on privacy and efficiency. It involves rendering PDF pages as images and then using the LLM for content extraction, even from scanned PDFs.
    2025-04-16 Tags: , , , , , , , , by klotz
  4. Machine Learning models can now accurately replicate cuneiform characters from photos of ancient tablets, facilitating the reading of complex scripts. The ProtoSnap approach aligns a prototype character with individual variations on tablets, enabling precise reproduction. This method enhances optical character recognition, improving the identification of rare and varied characters. The advancement could significantly increase the availability of ancient texts for analysis.
    2025-03-05 Tags: , , , by klotz
  5. A toolkit for training language models to work with PDF documents in the wild, including prompting strategies, evaluation tools, filtering, finetuning code, and processing PDFs through finetuned models.
  6. Qwen2.5-VL, the latest vision-language model from Qwen, showcases enhanced image recognition, agentic behavior, video comprehension, document parsing, and more. It outperforms previous models in various benchmarks and tasks, offering improved efficiency and performance.
  7. Microsoft has open-sourced MarkItDown, a state-of-the-art application designed to convert various file types into Markdown format for seamless integration, collaboration, and accessibility. The tool supports multiple file formats, including PDFs, PowerPoint presentations, Word documents, Excel spreadsheets, images, audio, HTML, text-based formats, and ZIP files, making it a versatile utility for users across different domains.
  8. A guide on how to understand and read bank statements effectively, highlighting key components and terms, and discussing the importance for financial management and fraud prevention.
  9. This article guides readers through building an OCR application using the Llama 3.2-Vision model from Ollama, using Python as the programming language. It includes steps for setting up the environment, installing necessary tools, and writing the OCR script.
    2024-11-21 Tags: , , , by klotz
  10. Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.
    2024-11-01 Tags: , , , , , , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "ocr+llm"

About - Propulsed by SemanticScuttle