NVIDIA Nemotron Parse v1.1 is designed to understand document semantics and extract text and tables elements with spatial grounding. It transforms unstructured documents into actionable and machine-usable representations.
SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.