klotz: pdf*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A post discussing new techniques developed for parsing and searching PDFs, focusing on turning them into a hierarchical structure for RAG search. The approach involves dynamically generating chunks for searches, sending headers and sub-headers to the Language Model along with relevant chunks.
    2024-06-27 Tags: , , , , , by klotz
  2. The llmsherpa project provides APIs to accelerate Large Language Model (LLM) projects. It includes features like LayoutPDFReader for PDF text parsing, smart chunking for vector search and Retrieval Augmented Generation, and table analysis. It is open-sourced under Apache 2.0 license.
  3. We introduce LayoutLM, one of the renowned models for extracting information from documents, developed by Microsoft. To tailor a solution for our specific needs, we label our documents using Label Studio, an open-source labeling tool, connected to our remote storage AWS S3.
  4. 2024-03-14 Tags: , , , by klotz
  5. apt install tesseract-ocr-fra
    2024-03-04 Tags: , , , , , by klotz
  6. - WKHTMLTOPDF is a set of open source command line tools for converting HTML pages into PDFs or images.
    - It uses Qt WebKit rendering engine and runs headlessly without requiring a display.
    - A C library is available too.
    2024-02-06 Tags: , , , , , , by klotz
  7. PDFwhisper allows you to have a conversation with your PDF docs. Finding info on your PDF files is now easier than ever.
    2024-01-12 Tags: , , , , , , , by klotz
  8. 2023-08-14 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: pdf

About - Propulsed by SemanticScuttle