SemanticScuttle - klotz.me » klotz: deepseek r1+reinforcement learning

klotz: deepseek r1* + reinforcement learning*

TinyZero

TinyZero is a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks. It is built upon veRL and allows the 3B base LM to develop self-verification and search abilities through reinforcement learning.

2025-02-01 Tags: deepseek r1, reinforcement learning, tinyzero, llm by klotz

First / Previous / Next / Last / Page 1 of 0