Fine-tune DeepSeek models using your own markdown files as training data. Converts your notes/docs into high-quality Q&A pairs using Gemini, then trains a personalized LLM via Tinker cloud GPUs.
New research reveals that DeepSeek-R1 produces more security vulnerabilities in code generated from prompts containing politically sensitive topics for China, such as Tibet or Uyghurs.
A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.
Details the development and release of DeepCoder-14B-Preview, a 14B parameter code reasoning model achieving performance comparable to o3-mini through reinforcement learning, along with the dataset, code, and system optimizations used in its creation.
Alibaba's Qwen team aims to find out with its latest release, QwQ. Despite having a fraction of DeepSeek R1's claimed 671 billion parameters, Alibaba touts its comparatively compact 32-billion 'reasoning' model as outperforming R1 in select math, coding, and function-calling benchmarks.
China appears to think homegrown AI startup DeepSeek could become a notable tech success story for the country. After DeepSeek's sudden rise to fame with the release of its open 'reasoning' model, R1, the company is now operating under new, tighter government-influenced restrictions.
Leading AI firms are using 'distillation' to create cheaper and more efficient models, following a technique pioneered by DeepSeek. This process involves using a large 'teacher' model to train smaller 'student' models, making AI capabilities more accessible and cost-effective.
The article discusses the implications of Sam Altman's proposal to modify the social contract in light of advancements in AI, emphasizing the potential risks to marginalized communities and democratic values. It critiques the exclusionary nature of traditional social contract theories and questions the role of tech leaders in shaping societal norms.
The article discusses DeepSeek's significant advancements in large language model (LLM) efficiency, emphasizing its impact on AI development without constituting a fundamental breakthrough in artificial general intelligence (AGI). It highlights the importance of open-source models, China's role in AI progress, and the future shift towards alternative AGI architectures beyond transformers.
The article explores the architectural changes that enable DeepSeek's models to perform well with fewer resources, focusing on Multi-Head Latent Attention (MLA). It discusses the evolution of attention mechanisms, from Bahdanau to Transformer's Multi-Head Attention (MHA), and introduces Grouped-Query Attention (GQA) as a solution to MHA's memory inefficiencies. The article highlights DeepSeek's competitive performance despite lower reported training costs.