SemanticScuttle - klotz.me » klotz: deepseek-r1+sft

klotz: deepseek-r1* + sft*

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

This paper surveys recent replication studies of DeepSeek-R1, focusing on Supervised Fine-Tuning (SFT) and Reinforcement Learning from Verifiable Rewards (RLVR). It details data construction, method design, and training procedures, offering insights and anticipating future research directions for reasoning language models.

2025-05-04 Tags: reasoning language models, deepseek-r1, replication studies, sft, rlvr, language models, artificial intelligence by klotz

Arxiv s1: Simple test-time scaling

The article introduces a new approach to language modeling called test-time scaling, which enhances performance by utilizing additional compute resources during testing. The authors present a method involving a curated dataset and a technique called budget forcing to control compute usage, allowing models to double-check answers and improve reasoning. The approach is demonstrated with the Qwen2.5-32B-Instruct language model, showing significant improvements on competition math questions.

2025-02-14 Tags: arxiv, test-time scaling, budget forcing, llm, qwen2.5-32b-instruct, sft, fine tuning, reinforcement learning, machine learning, deepseek-r1 by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: deepseek-r1* + sft*

Linked Tags

Related Tags