SemanticScuttle - klotz.me » klotz: rstar-math

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

The article presents rStar-Math, a method demonstrating that small language models (SLMs) can rival or surpass the math reasoning capabilities of larger models like OpenAI's without distillation. rStar-Math employs Monte Carlo Tree Search (MCTS) for 'deep thinking', using a math policy SLM guided by an SLM-based process reward model. It introduces three innovations: a code-augmented CoT data synthesis method for training the policy SLM, a novel process reward model training method avoiding step-level score annotation, and a self-evolution recipe where both the policy SLM and process preference model are iteratively improved. Through self-evolution with millions of solutions for 747k math problems, rStar-Math achieves state-of-the-art math reasoning, significantly improving performance on benchmarks like MATH and AIME.

2025-01-11 Tags: small language models, llm, math, reasoning, self-evolution, rstar-math by klotz

SemanticScuttle - klotz.me

klotz: rstar-math*

Linked Tags

Related Tags