We introduce LayoutLM, one of the renowned models for extracting information from documents, developed by Microsoft. To tailor a solution for our specific needs, we label our documents using Label Studio, an open-source labeling tool, connected to our remote storage AWS S3.
- WKHTMLTOPDF is a set of open source command line tools for converting HTML pages into PDFs or images.
- It uses Qt WebKit rendering engine and runs headlessly without requiring a display.
- A C library is available too.
PDFwhisper allows you to have a conversation with your PDF docs. Finding info on your PDF files is now easier than ever.