SemanticScuttle - klotz.me » klotz: llm+ocr

klotz: llm* + ocr*

AI models make precise copies of cuneiform characters

Machine Learning models can now accurately replicate cuneiform characters from photos of ancient tablets, facilitating the reading of complex scripts. The ProtoSnap approach aligns a prototype character with individual variations on tablets, enabling precise reproduction. This method enhances optical character recognition, improving the identification of rare and varied characters. The advancement could significantly increase the availability of ancient texts for analysis.

2025-03-05 Tags: llm, vlm cuneiform, ocr, ancient history by klotz

olmOCR: Toolkit for Training Language Models to Work with PDF Documents

A toolkit for training language models to work with PDF documents in the wild, including prompting strategies, evaluation tools, filtering, finetuning code, and processing PDFs through finetuned models.

2025-02-28 Tags: pdf, llm, pdf processing, olmocr, allenai, ocr, document management, document conversion by klotz

Introducing Qwen2.5-VL: Advanced Vision-Language Model Capabilities

Qwen2.5-VL, the latest vision-language model from Qwen, showcases enhanced image recognition, agentic behavior, video comprehension, document parsing, and more. It outperforms previous models in various benchmarks and tasks, offering improved efficiency and performance.

2025-02-09 Tags: qwen2.5-vl, vision-language model, image recognition, document parsing, ocr, multimodal, llm, machine learning by klotz

Microsoft Open Sourced MarkItDown: An AI Tool to Convert All Files into Markdown for Seamless Integration and Analysis

Microsoft has open-sourced MarkItDown, a state-of-the-art application designed to convert various file types into Markdown format for seamless integration, collaboration, and accessibility. The tool supports multiple file formats, including PDFs, PowerPoint presentations, Word documents, Excel spreadsheets, images, audio, HTML, text-based formats, and ZIP files, making it a versatile utility for users across different domains.

2024-12-20 Tags: microsoft, markitdown, llm, markdown, ocr, exif, speech, conversion by klotz

How to Read a Bank Statement and Actually Understanding It

A guide on how to understand and read bank statements effectively, highlighting key components and terms, and discussing the importance for financial management and fraud prevention.

2024-11-30 Tags: bank statement, fintech, ocr, document recognition, antifraud, llm, prompt engineering by klotz

Build a Local Ollama OCR Application Powered By Llama 3.2-Vision

This article guides readers through building an OCR application using the Llama 3.2-Vision model from Ollama, using Python as the programming language. It includes steps for setting up the environment, installing necessary tools, and writing the OCR script.

2024-11-21 Tags: ollama, ocr, llm, llama 3.2-vision by klotz

DS4SD / docling

Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.

2024-11-01 Tags: docling, ibm, document, parsing, markdown, json, pdf, docx, pptx, ocr, llm by klotz

LucidWebSearch

A web search extension for Oobabooga's text-generation-webui (now with nougat) that allows for web search integration with the AI.

2024-09-03 Tags: llm, search, text-generation, oobabooga, extension, ocr, nougat by klotz

OpenRecall: A Free and Open Source Alternative to Microsoft Recall Feature

OpenRecall is an open-source software that aims to be a privacy-focused alternative to Microsoft's Recall feature. It captures the user's digital history, processes text and images using OCR, and allows users to find specific information by searching for relevant keywords. Currently, it stores data locally but does not encrypt it. It is available for Windows, macOS, and Linux.

2024-07-18 Tags: openrecall, microsoft, recall, ocr, llm, knowledge base by klotz

R2R: Production-Ready RAG systems

SciPhi-AI/R2R is a framework for rapid development and deployment of production-ready RAG pipelines. The framework enables the deployment, customization, extension, autoscaling, and optimization of RAG pipeline systems, making it easier for the OSS community to use them. It includes several code examples and client applications that demonstrate application deployment and interaction. The core abstractions come in the form of ingestion, embedding, RAG, and eval pipelines.

2024-04-02 Tags: search pdf, machine-learning, ocr, deep-learning, retrieval, chatbot, artificial-intelligence, question-answering, data-pipelines, retrieval-systems, large-language-models, llm, langchain, llama-index, retrieval-augmented-generation by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: llm* + ocr*

Linked Tags

Related Tags