IBM has introduced Granite 4.0 3B Vision, a specialized vision-language model (VLM) engineered for high-fidelity enterprise document data extraction. Unlike monolithic multimodal models, this release uses a modular LoRA adapter architecture, adding approximately 0.5B parameters to the Granite 4.0 Micro base model. This design allows for efficient dual-mode deployment, activating vision capabilities only when multimodal processing is required. The model excels at converting complex visual elements, such as charts and tables, into structured machine-readable formats like JSON, HTML, and CSV. By utilizing a high-resolution tiling mechanism and a DeepStack architecture for improved spatial alignment, Granite 4.0 3B Vision achieves impressive accuracy in tasks like Key-Value Pair extraction and chart reasoning, ranking highly on industry benchmarks.