SemanticScuttle - klotz.me » Tags: vlm+transformers

Tags: vlm* + transformers*

0 bookmark(s) - Sort by: Date ↓ / Title /

NVIDIA Nemotron Parse v1.1 is designed to understand document semantics and extract text and tables elements with spatial grounding. It transforms unstructured documents into actionable and machine-usable representations.

2025-11-28 Tags: image-to-text, transformers, ocr, vlm, feature-extraction, nvidia, document understanding, table extraction by klotz

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.

2025-02-21 Tags: smolvlm2, video understanding, python, machine learning, video, transformers, mlx, vlm, llm by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: vlm* + transformers*

Linked Tags

Related Tags