MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format.
MarkItDown is a utility for converting various files to Markdown, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, text-based formats, and ZIP files.
- Approximate Tokens, Words and Characters Calculator for LLM's and Text Trimmer — Simple calculator to estimate tokens for Large Language Models and text editor to trim text
- Text File Merger for LLM — This tool combines multiple text files into a single document, with clear separation between files
- PDF to TXT Converter — Convert PDF documents to plain text format for use with LLMs and text analysis
- HTML to TXT Converter — Remove HTML tags and extract clean text content for LLM processing
- LLM System Prompt Generator — Generate optimized system prompts for different LLM model sizes (3B, 33B, 70B, etc.)
- Creative Idea Generator — AI-powered brainstorming tool for generating creative solutions and ideas
- WKHTMLTOPDF is a set of open source command line tools for converting HTML pages into PDFs or images.
- It uses Qt WebKit rendering engine and runs headlessly without requiring a display.
- A C library is available too.