MarkItDown is a utility for converting various files to Markdown, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, text-based formats, and ZIP files.
GitHub Models now allows developers to retrieve structured JSON responses from models directly in the UI, improving integration with applications and workflows. Supported models include OpenAI (except for o1-mini and o1-preview) and Mistral models.
A new plugin for sqlite-utils CLI tool called sqlite-utils-ask allows users to ask human-language questions directly of SQLite databases and CSV/JSON files, using an LLM to generate SQL queries and execute them.
Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats and provides advanced PDF understanding, metadata extraction, and integration with LlamaIndex and LangChain for RAG / QA applications.
Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.
A new plugin for LLM, llm-jq, generates and executes jq programs based on human-language descriptions, allowing users to manipulate JSON data without needing to write jq syntax.
The author records a screen capture of their Gmail account and uses Google Gemini to extract numeric values from the video.
A comprehensive guide testing structured output capabilities of Google Gemini, Anthropic Claude, and OpenAI GPT, with OpenAI GPT-4o offering the most consistent structured outputs right out of the box.
Weaviate introduces StructuredRAG, a benchmark to evaluate LLMs' ability to generate reliable JSON outputs. The study finds that while LLMs perform well on simpler tasks, they struggle with more complex outputs.
We introduce NuExtract, a lightweight text-to-JSON LLM. NuExtract allows to extract arbitrarily complex information from text and turns it into structured data.