This article details the release of Gemma 3, the latest iteration of Google’s open-weights language model. Key improvements include **vision-language capabilities** (using a tailored SigLIP encoder), **increased context length** (up to 128k tokens for larger models), and **architectural changes for improved memory efficiency** (5-to-1 interleaved attention and removal of softcapping). Gemma 3 demonstrates superior performance compared to Gemma 2 across benchmarks and offers models optimized for various use cases, including on-device applications with the 1B model.
This document details how to run Gemma models, covering framework selection, variant choice, and running generation/inference requests. It emphasizes considering available hardware resources and provides recommendations for beginners.