klotz: constrained decoding*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This repository provides the official implementation of the STATIC (Sparse Transition-Accelerated Trie Index for Constrained decoding) framework, as described in Su et al., 2026. STATIC is a high-performance method for enforcing outputs to stay within a prespecified set during autoregressive decoding from large language models, designed for maximum efficiency on modern hardware accelerators like GPUs and TPUs.
  2. Google AI introduces STATIC, a sparse matrix framework that accelerates constrained decoding for LLM-based generative retrieval. It addresses the inefficiency of traditional trie implementations on hardware accelerators by flattening the trie into a static Compressed Sparse Row (CSR) matrix, achieving up to 948x speedup and demonstrating improvements in YouTube video recommendations.
  3. An overview of popular techniques to confine LLMs' output to a predefined schema, covering API providers, prompting/reprompting strategies, and constrained decoding.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: constrained decoding

About - Propulsed by SemanticScuttle