SemanticScuttle - klotz.me » klotz: constrained decoding

klotz: constrained decoding*

Sparse Transition Matrix-Accelerated Trie Index for Constrained Decoding

This repository provides the official implementation of the STATIC (Sparse Transition-Accelerated Trie Index for Constrained decoding) framework, as described in Su et al., 2026. STATIC is a high-performance method for enforcing outputs to stay within a prespecified set during autoregressive decoding from large language models, designed for maximum efficiency on modern hardware accelerators like GPUs and TPUs.

2026-03-02 Tags: constrained decoding, large language models, sparse trie, accelerator, jax, pytorch, inference, beam search, github, youtube, google by klotz

Google AI Introduces STATIC: A Sparse Matrix Framework Delivering 948x Faster Constrained Decoding for LLM Based Generative Retrieval

Google AI introduces STATIC, a sparse matrix framework that accelerates constrained decoding for LLM-based generative retrieval. It addresses the inefficiency of traditional trie implementations on hardware accelerators by flattening the trie into a static Compressed Sparse Row (CSR) matrix, achieving up to 948x speedup and demonstrating improvements in YouTube video recommendations.

2026-03-02 Tags: large language model, llm, generative retrieval, constrained decoding, static, sparse matrix, trie, tpu, gpu, google ai, recommendation systems, semantic ids, machine learning by klotz

Generating Structured Outputs from LLMs

An overview of popular techniques to confine LLMs' output to a predefined schema, covering API providers, prompting/reprompting strategies, and constrained decoding.

2025-08-09 Tags: llm, structured output, api, prompting, constrained decoding, regex, json, pydantic by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: constrained decoding*

Linked Tags

Related Tags