This page details the DeepSeek-R1-0528-Qwen3-8B model, a quantized version of DeepSeek-R1-0528, highlighting its improved reasoning capabilities, evaluation results, usage guidelines, and licensing information. It offers various quantization options (GGUF) for local execution.
Alibaba’s Qwen team released the Qwen 3 model family, offering a range of sizes and capabilities. The article discusses the model's features, performance, and the well-coordinated release across the LLM ecosystem, highlighting the trend of better models running on the same hardware.
A new study reveals that while current AI models excel at solving math *problems*, they struggle with the *reasoning* required for mathematical *proofs*, demonstrating a gap between pattern recognition and genuine mathematical understanding.
This paper proposes the Knowledge Graph of Thoughts (KGoT) architecture for AI assistants, integrating LLM reasoning with dynamically constructed knowledge graphs to reduce costs and improve performance on complex tasks like the GAIA benchmark.
A new paper by researchers from Google Research and UC Berkeley shows that a simple sampling-based search approach can enhance the reasoning abilities of large language models (LLMs) without needing specialized training or complex architectures.
ByteDance Research has released DAPO (Dynamic Sampling Policy Optimization), an open-source reinforcement learning system for LLMs, aiming to improve reasoning abilities and address reproducibility issues. DAPO includes innovations like Clip-Higher, Dynamic Sampling, Token-level Policy Gradient Loss, and Overlong Reward Shaping, achieving a score of 50 on the AIME 2024 benchmark with the Qwen2.5-32B model.
Alibaba's Qwen team aims to find out with its latest release, QwQ. Despite having a fraction of DeepSeek R1's claimed 671 billion parameters, Alibaba touts its comparatively compact 32-billion 'reasoning' model as outperforming R1 in select math, coding, and function-calling benchmarks.
Improving the reasoning capabilities of large language models (LLMs) typically requires supervised fine-tuning with labeled data or computationally expensive sampling. We introduce Unsupervised Prefix Fine-Tuning (UPFT), which leverages the observation of Prefix Self-Consistency -- the shared initial reasoning steps across diverse solution trajectories -- to enhance LLM reasoning efficiency. By training exclusively on the initial prefix substrings (as few as 8 tokens), UPFT removes the need for labeled data or exhaustive sampling. Experiments on reasoning benchmarks show that UPFT matches the performance of supervised methods such as Rejection Sampling Fine-Tuning, while reducing training time by 75% and sampling cost by 99%.
The article introduces Huginn-3.5B, a novel AI reasoning model developed by researchers from multiple institutions. It utilizes a recurrent depth approach for efficient and scalable reasoning by refining its hidden state iteratively within a latent space, rather than relying on external token generation. This allows it to dynamically allocate computational resources and perform efficiently across various tasks without needing specialized training data.
A new test-time scaling method called budget forcing boosts LLM reasoning without increasing model size, outperforming OpenAI's o1-preview.
This method, developed by researchers at Stanford University, controls the computational effort an LLM expends during inference, allowing it to either stop reasoning early or think longer. The researchers created a curated dataset called s1K to test this method and found that their model, s1-32B, outperformed OpenAI’s o1-preview model on competitive math benchmarks by up to 27%.