This page provides GGUF quantized versions of DiffusionGemma 26B A4B-it, a multimodal model from Google DeepMind based on the Gemma 4 architecture. The model employs discrete text diffusion through block-autoregressive multi-canvas sampling to achieve significantly faster decoding speeds than standard autoregressive models. It is capable of processing interleaved inputs consisting of text, images with variable resolutions and aspect ratios, and video content for generating textual outputs.
Key topics:
- Mixture-of-Experts architecture with 3.8 billion active parameters.
- High-speed generation through parallel denoising of token blocks.
- Multimodal input support including image and video understanding.
- Extensive context window capability up to 256K tokens.
- Integrated reasoning modes for step-by-step thought processes.
Google has introduced Gemma 4 12B, a mid-sized multimodal model designed to bring agentic intelligence directly to consumer laptops. This model bridges the gap between smaller edge models and larger Mixture of Experts versions by offering high performance with a significantly reduced memory footprint. A key innovation is its encoder-free architecture, which allows vision and audio inputs to flow directly into the language model backbone rather than relying on separate, latency-inducing encoders.
Main topics:
Novel unified architecture without multimodal encoders
Native support for direct audio and vision input processing
Optimized for local execution on hardware with 16GB of RAM
Reasoning performance nearing much larger 26B models
Released under an Apache 2.0 license
Integrated Multi-Token Prediction drafters to reduce latency