klotz: rstar-math*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. The article presents rStar-Math, a method demonstrating that small language models (SLMs) can rival or surpass the math reasoning capabilities of larger models like OpenAI's without distillation. rStar-Math employs Monte Carlo Tree Search (MCTS) for 'deep thinking', using a math policy SLM guided by an SLM-based process reward model. It introduces three innovations: a code-augmented CoT data synthesis method for training the policy SLM, a novel process reward model training method avoiding step-level score annotation, and a self-evolution recipe where both the policy SLM and process preference model are iteratively improved. Through self-evolution with millions of solutions for 747k math problems, rStar-Math achieves state-of-the-art math reasoning, significantly improving performance on benchmarks like MATH and AIME.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: rstar-math

About - Propulsed by SemanticScuttle