klotz: budget forcing*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A new test-time scaling method called budget forcing boosts LLM reasoning without increasing model size, outperforming OpenAI's o1-preview.

    This method, developed by researchers at Stanford University, controls the computational effort an LLM expends during inference, allowing it to either stop reasoning early or think longer. The researchers created a curated dataset called s1K to test this method and found that their model, s1-32B, outperformed OpenAI’s o1-preview model on competitive math benchmarks by up to 27%.

  2. The article introduces a new approach to language modeling called test-time scaling, which enhances performance by utilizing additional compute resources during testing. The authors present a method involving a curated dataset and a technique called budget forcing to control compute usage, allowing models to double-check answers and improve reasoning. The approach is demonstrated with the Qwen2.5-32B-Instruct language model, showing significant improvements on competition math questions.

  3. This repository provides an overview of resources for the paper 's1: Simple test-time scaling', which includes minimal recipes for test-time scaling and strong reasoning performance. It covers artifacts, structure, inference, training, evaluation, data, visuals, and citation details.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: budget forcing

About - Propulsed by SemanticScuttle