SemanticScuttle - klotz.me » Tags: document

Tags: document*

0 bookmark(s) - Sort by: Date ↓ / Title /

A Coding Implementation to Build a Document Search Agent (DocSearchAgent) with Hugging Face, ChromaDB, and Langchain

This tutorial demonstrates how to build a powerful document search engine using Hugging Face embeddings, Chroma DB, and Langchain for semantic search capabilities.

2025-03-21 Tags: document, search, hugging face, chromadb, langchain, vector database, embedding, agents, llm by klotz

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

This article introduces the pyramid search approach using Agentic Knowledge Distillation to address the limitations of traditional RAG strategies in document ingestion.

The pyramid structure allows for multi-level retrieval, including atomic insights, concepts, abstracts, and recollections. This structure mimics a knowledge graph but uses natural language, making it more efficient for LLMs to interact with.

Knowledge Distillation Process:

Conversion to Markdown: Documents are converted to Markdown for better token efficiency and processing.
Atomic Insights Extraction: Each page is processed using a two-page sliding window to generate a list of insights in simple sentences.
Concept Distillation: Higher-level concepts are identified from the insights to reduce noise and preserve essential information.
Abstract Creation: An LLM writes a comprehensive abstract for each document, capturing dense information efficiently.
Recollections/Memories: Critical information useful across all tasks is stored at the top of the pyramid.

2025-03-07 Tags: agent, knowledge distillation, rag, document, pyramid search, llm, information retrieval, scuttle, summarizer by klotz

How to Install DumbPad on Your Synology NAS

A detailed guide on how to install DumbPad on your Synology NAS using Docker & Portainer. Learn how to set up and customize DumbPad for note-taking with auto-save and dark mode support.

2025-02-13 Tags: dumbpad, synology, docker, portainer, document, ubuntu by klotz

Send to Kindle

Send once, read anywhere. Convert and send documents to your Kindle library or specific devices. Supported file types include PDF, DOCX, TXT, etc., with a max file size of 200 MB.

2024-12-27 Tags: kindle, document, conversion by klotz

MarkItDown - Python tool for converting files and office documents to Markdown

MarkItDown is a utility for converting various files to Markdown, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, text-based formats, and ZIP files.

2024-12-30 Tags: markitdown, markdown, file conversion, python, office documents, pdf, powerpoint, word, excel, images, audio, html, csv, json, xml, zip, openai, large language models, docker, llm, document, conversion by klotz

Meet MegaParse: An Open-Source AI Tool for Parsing Various Types of Documents for LLM Ingestion

MegaParse is an open-source tool designed for parsing and converting various types of documents for ingestion into LLM. It supports multiple document formats, including text, PDF, PowerPoint, Excel, CSV, and Word documents, and offers customizable output formats to meet different LLM requirements, making it a versatile and efficient solution for data preparation in LLM applications.

2024-12-05 Tags: megaparse, document, conversion, text analysis by klotz

Docling

Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats and provides advanced PDF understanding, metadata extraction, and integration with LlamaIndex and LangChain for RAG / QA applications.

2024-11-01 Tags: docling, document, parsing, export, markdown, json, pdf, ibm, github, foss by klotz

DS4SD / docling

Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.

2024-11-01 Tags: docling, ibm, document, parsing, markdown, json, pdf, docx, pptx, ocr, llm by klotz

kotaemon: Open-source RAG UI for chatting with your documents

An open-source project offering a functional RAG UI for document QA, suitable for both end-users and developers. It supports various LLM providers, is customizable, and offers multi-modal QA, citations, and complex reasoning methods.

2024-10-13 Tags: kotaemon, rag, ui, document, qa, github, python, gradio by klotz

Document Parsing Using Large Language Models — With Code

This article explores the use of large language models (LLMs) for document parsing, offering a more powerful and flexible alternative to traditional methods like regular expressions. It discusses the workflow involved in processing documents like research papers using LLMs, highlighting the benefits and advantages of this approach.

2024-07-25 Tags: document, pasring llm, regular expressions, data extraction, production engineering by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: document*

Linked Tags

Related Tags