klotz: torch*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. An open-source, theoretical implementation of the Claude Mythos model architecture. The project implements a Recurrent-Depth Transformer (RDT) consisting of three stages: a Prelude, a looped Recurrent Block, and a final Coda. It utilizes switchable attention between Multi-Latent Attention (MLA) and Grouped Query Attention (GQA), alongside a sparse Mixture of Experts (MoE) design to facilitate compute-adaptive reasoning in continuous latent space.
    Key technical features include:
    * Recurrent-Depth Transformer architecture for implicit chain-of-thought reasoning.
    * LTI-stable injection parameters to prevent residual explosion during training.
    * Support for multiple model scales ranging from 1B to 1T parameters.
    * Integration of Adaptive Computation Time (ACT) or similar halting mechanisms to manage overthinking.
    * Use of fine-grained MoE with shared experts to balance breadth and depth.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: torch

About - Propulsed by SemanticScuttle