SemanticScuttle - klotz.me » Tags: document+llm+large language models

Tags: document* + llm* + large language models*

0 bookmark(s) - Sort by: Date ↓ / Title /

A Coding Implementation to Build a Document Search Agent (DocSearchAgent) with Hugging Face, ChromaDB, and Langchain

This tutorial demonstrates how to build a powerful document search engine using Hugging Face embeddings, Chroma DB, and Langchain for semantic search capabilities.

2025-03-21 Tags: document, search, hugging face, chromadb, langchain, vector database, embedding, agents, llm by klotz

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

This article introduces the pyramid search approach using Agentic Knowledge Distillation to address the limitations of traditional RAG strategies in document ingestion.

The pyramid structure allows for multi-level retrieval, including atomic insights, concepts, abstracts, and recollections. This structure mimics a knowledge graph but uses natural language, making it more efficient for LLMs to interact with.

**Knowledge Distillation Process**:
- **Conversion to Markdown**: Documents are converted to Markdown for better token efficiency and processing.
- **Atomic Insights Extraction**: Each page is processed using a two-page sliding window to generate a list of insights in simple sentences.
- **Concept Distillation**: Higher-level concepts are identified from the insights to reduce noise and preserve essential information.
- **Abstract Creation**: An LLM writes a comprehensive abstract for each document, capturing dense information efficiently.
- **Recollections/Memories**: Critical information useful across all tasks is stored at the top of the pyramid.

2025-03-07 Tags: agent, knowledge distillation, rag, document, pyramid search, llm, information retrieval, scuttle, summarizer by klotz

MarkItDown - Python tool for converting files and office documents to Markdown

MarkItDown is a utility for converting various files to Markdown, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, text-based formats, and ZIP files.

2024-12-30 Tags: markitdown, markdown, file conversion, python, office documents, pdf, powerpoint, word, excel, images, audio, html, csv, json, xml, zip, openai, large language models, docker, llm, document, conversion by klotz

DS4SD / docling

Docling is a tool that parses documents and exports them to desired formats like Markdown and JSON. It supports various document formats including PDF, DOCX, PPTX, Images, HTML, AsciiDoc, and Markdown.

2024-11-01 Tags: docling, ibm, document, parsing, markdown, json, pdf, docx, pptx, ocr, llm by klotz

Parse Your Invoices with LayoutLM and Label Studio

We introduce LayoutLM, one of the renowned models for extracting information from documents, developed by Microsoft. To tailor a solution for our specific needs, we label our documents using Label Studio, an open-source labeling tool, connected to our remote storage AWS S3.

2024-04-16 Tags: llm, document, image processing, recognition, pdf, invoice by klotz

Document AI Custom Extractor, powered by gen AI, is now Generally Available

train models for processing documents based on specific needs and requirements. It offers capabilities such as entity recognition, key information extraction, and data validation,

2024-01-12 Tags: document, llm, google, extraction, scraper by klotz

JPMorgan AI Research Introduces DocLLM: A Lightweight Extension to Traditional Large Language Models Tailored for Generative Reasoning Over Documents with Rich Layouts - MarkTechPost

2024-01-06 Tags: llm, multi-modal, structured, document, classification by klotz

Building RAG-Based Chatbots Using Streamlit + Langchain

2023-11-07 Tags: rag, llm, streamlit, langchain, document, search, summarization, q and a by klotz

Unveiling Ragna: An Open Source RAG-based AI Orchestration Framework Designed to Scale From Research to Production | Quansight Consulting

pip install 'ragna builtin » ' # Install ragna with all extensions
ragna config # Initialize configuration
ragna ui # Launch the web app

2023-11-02 Tags: llm, rag, python, ragna, ui, chat, document, orchestration by klotz

From RAGs to Riches 10 Applications of vector search to deeply understand your data and models

Image Similarity Search
Reverse Image Search
Object Similarity Search
Robust OCR Document Search
Semantic Search
Cross-modal Retrieval
Probing Perceptual Similarity
Comparing Model Representations
Concept Interpolation
Concept Space Traversal
Image Similarity Search

2023-10-26 Tags: llm, rag, similarity, vision, document, search, machine learning by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: document* + llm* + large language models*

Linked Tags

Related Tags