klotz: document*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article introduces the pyramid search approach using Agentic Knowledge Distillation to address the limitations of traditional RAG strategies in document ingestion.

    The pyramid structure allows for multi-level retrieval, including atomic insights, concepts, abstracts, and recollections. This structure mimics a knowledge graph but uses natural language, making it more efficient for LLMs to interact with.

    Knowledge Distillation Process:

    • Conversion to Markdown: Documents are converted to Markdown for better token efficiency and processing.
    • Atomic Insights Extraction: Each page is processed using a two-page sliding window to generate a list of insights in simple sentences.
    • Concept Distillation: Higher-level concepts are identified from the insights to reduce noise and preserve essential information.
    • Abstract Creation: An LLM writes a comprehensive abstract for each document, capturing dense information efficiently.
    • Recollections/Memories: Critical information useful across all tasks is stored at the top of the pyramid.
  2. A detailed guide on how to install DumbPad on your Synology NAS using Docker & Portainer. Learn how to set up and customize DumbPad for note-taking with auto-save and dark mode support.

    2025-02-13 Tags: , , , , , by klotz
  3. Send once, read anywhere. Convert and send documents to your Kindle library or specific devices. Supported file types include PDF, DOCX, TXT, etc., with a max file size of 200 MB.

    2024-12-27 Tags: , , by klotz
  4. MarkItDown is a utility for converting various files to Markdown, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, text-based formats, and ZIP files.

  5. MegaParse is an open-source tool designed for parsing and converting various types of documents for ingestion into LLM. It supports multiple document formats, including text, PDF, PowerPoint, Excel, CSV, and Word documents, and offers customizable output formats to meet different LLM requirements, making it a versatile and efficient solution for data preparation in LLM applications.

  6. Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats and provides advanced PDF understanding, metadata extraction, and integration with LlamaIndex and LangChain for RAG / QA applications.

    2024-11-01 Tags: , , , , , , , , , by klotz
  7. Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.

    2024-11-01 Tags: , , , , , , , , , , by klotz
  8. An open-source project offering a functional RAG UI for document QA, suitable for both end-users and developers. It supports various LLM providers, is customizable, and offers multi-modal QA, citations, and complex reasoning methods.

    2024-10-13 Tags: , , , , , , , by klotz
  9. This article explores the use of large language models (LLMs) for document parsing, offering a more powerful and flexible alternative to traditional methods like regular expressions. It discusses the workflow involved in processing documents like research papers using LLMs, highlighting the benefits and advantages of this approach.

  10. We introduce LayoutLM, one of the renowned models for extracting information from documents, developed by Microsoft. To tailor a solution for our specific needs, we label our documents using Label Studio, an open-source labeling tool, connected to our remote storage AWS S3.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: document

About - Propulsed by SemanticScuttle