SemanticScuttle - klotz.me » klotz: budget forcing

klotz: budget forcing*

A new test-time scaling method called budget forcing boosts LLM reasoning without increasing model size, outperforming OpenAI's o1-preview.

This method, developed by researchers at Stanford University, controls the computational effort an LLM expends during inference, allowing it to either stop reasoning early or think longer. The researchers created a curated dataset called s1K to test this method and found that their model, s1-32B, outperformed OpenAI’s o1-preview model on competitive math benchmarks by up to 27%.

2025-02-14 Tags: llm, test-time scaling, budget forcing, reasoning, budget by klotz

Arxiv s1: Simple test-time scaling

The article introduces a new approach to language modeling called test-time scaling, which enhances performance by utilizing additional compute resources during testing. The authors present a method involving a curated dataset and a technique called budget forcing to control compute usage, allowing models to double-check answers and improve reasoning. The approach is demonstrated with the Qwen2.5-32B-Instruct language model, showing significant improvements on competition math questions.

2025-02-14 Tags: arxiv, test-time scaling, budget forcing, llm, qwen2.5-32b-instruct, sft, fine tuning, reinforcement learning, machine learning, deepseek-r1 by klotz

GitHub s1: Simple test-time scaling

This repository provides an overview of resources for the paper 's1: Simple test-time scaling', which includes minimal recipes for test-time scaling and strong reasoning performance. It covers artifacts, structure, inference, training, evaluation, data, visuals, and citation details.

2025-02-14 Tags: test-time scaling, budget forcing, reasoning performance, github, llm, s1, machine learning, distillation by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: budget forcing*

Linked Tags

Related Tags