SemanticScuttle - klotz.me » klotz: multimodal model

klotz: multimodal model*

Thinking Machines has released Inkling, an open-weights Mixture-of-Experts transformer model featuring 975B total parameters and a context window of up to 1M tokens. The model was trained on 45 trillion tokens across text, images, audio, and video to enable native multimodal reasoning. It is designed with controllable thinking effort to optimize the balance between performance and cost/latency, alongside strong capabilities for agentic coding and tool use.
- Native multimodality in vision, audio, and text
- Controllable computational effort settings
- High proficiency in agentic workflows and design tasks
- Available on Tinker for custom fine-tuning

2026-07-16 Tags: inkling, mixture-of-experts, multimodal model, open weights, machine learning by klotz

unsloth/diffusiongemma-26B-A4B-it-GGUF

This page provides GGUF quantized versions of DiffusionGemma 26B A4B-it, a multimodal model from Google DeepMind based on the Gemma 4 architecture. The model employs discrete text diffusion through block-autoregressive multi-canvas sampling to achieve significantly faster decoding speeds than standard autoregressive models. It is capable of processing interleaved inputs consisting of text, images with variable resolutions and aspect ratios, and video content for generating textual outputs.
Key topics:
- Mixture-of-Experts architecture with 3.8 billion active parameters.
- High-speed generation through parallel denoising of token blocks.
- Multimodal input support including image and video understanding.
- Extensive context window capability up to 256K tokens.
- Integrated reasoning modes for step-by-step thought processes.

2026-06-12 Tags: diffusion gemma, unsloth, gguf, google deepmind, multimodal model, mixture of experts, llm by klotz

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google has introduced Gemma 4 12B, a mid-sized multimodal model designed to bring agentic intelligence directly to consumer laptops. This model bridges the gap between smaller edge models and larger Mixture of Experts versions by offering high performance with a significantly reduced memory footprint. A key innovation is its encoder-free architecture, which allows vision and audio inputs to flow directly into the language model backbone rather than relying on separate, latency-inducing encoders.
Main topics:
Novel unified architecture without multimodal encoders
Native support for direct audio and vision input processing
Optimized for local execution on hardware with 16GB of RAM
Reasoning performance nearing much larger 26B models
Released under an Apache 2.0 license
Integrated Multi-Token Prediction drafters to reduce latency

2026-06-03 Tags: gemma 4 12b, google deepmind, multimodal model, encoder-free architecture, local ai, agentic workflows, developer tools, apache 2.0 by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: multimodal model*

Linked Tags

Related Tags