0 bookmark(s) - Sort by: Date ↓ / Title /
A new test-time scaling method called budget forcing boosts LLM reasoning without increasing model size, outperforming OpenAI's o1-preview.
This method, developed by researchers at Stanford University, controls the computational effort an LLM expends during inference, allowing it to either stop reasoning early or think longer. The researchers created a curated dataset called s1K to test this method and found that their model, s1-32B, outperformed OpenAI’s o1-preview model on competitive math benchmarks by up to 27%.
First / Previous / Next / Last
/ Page 1 of 0