klotz: moe* + attention*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. An in-depth look at the architecture of OpenAI's GPT-OSS models, detailing tokenization, embeddings, transformer blocks, Mixture of Experts, attention mechanisms (GQA and RoPE), and quantization techniques.
  2. A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: moe + attention

About - Propulsed by SemanticScuttle