Tags: large language models* + docllm* + visual documents* + pre-training* + spatial layout*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. DocLLM is a lightweight extension to traditional LLMs for reasoning over visual documents, considering both textual semantics and spatial layout. It avoids expensive image encoders and focuses on bounding box information. It outperforms SotA LLMs on 14 out of 16 datasets across all tasks and generalizes well to previously unseen datasets.

    Keywords:

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "large language models+docllm+visual documents+pre-training+spatial layout"

About - Propulsed by SemanticScuttle