klotz: reasoning* + math*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Dolphin 3.0 R1 is an instruct-tuned model designed for general-purpose reasoning, coding, math, and function calling. It is designed to be a local model that businesses can control, including setting the system prompt and alignment.
  2. The article presents rStar-Math, a method demonstrating that small language models (SLMs) can rival or surpass the math reasoning capabilities of larger models like OpenAI's without distillation. rStar-Math employs Monte Carlo Tree Search (MCTS) for 'deep thinking', using a math policy SLM guided by an SLM-based process reward model. It introduces three innovations: a code-augmented CoT data synthesis method for training the policy SLM, a novel process reward model training method avoiding step-level score annotation, and a self-evolution recipe where both the policy SLM and process preference model are iteratively improved. Through self-evolution with millions of solutions for 747k math problems, rStar-Math achieves state-of-the-art math reasoning, significantly improving performance on benchmarks like MATH and AIME.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: reasoning + math

About - Propulsed by SemanticScuttle