SemanticScuttle - klotz.me » Tags: kyegomez

Tags: kyegomez*

0 bookmark(s) - Sort by: Date ↓ / Title /

An open-source, theoretical implementation of the Claude Mythos model architecture. The project implements a Recurrent-Depth Transformer (RDT) consisting of three stages: a Prelude, a looped Recurrent Block, and a final Coda. It utilizes switchable attention between Multi-Latent Attention (MLA) and Grouped Query Attention (GQA), alongside a sparse Mixture of Experts (MoE) design to facilitate compute-adaptive reasoning in continuous latent space.
Key technical features include:
* Recurrent-Depth Transformer architecture for implicit chain-of-thought reasoning.
* LTI-stable injection parameters to prevent residual explosion during training.
* Support for multiple model scales ranging from 1B to 1T parameters.
* Integration of Adaptive Computation Time (ACT) or similar halting mechanisms to manage overthinking.
* Use of fine-grained MoE with shared experts to balance breadth and depth.

2026-04-26 Tags: ai, ml, torch, pytorch, attention, looped-transformers, claude-mythos, moe, transformer, github, kyegomez by klotz

Meet OpenMythos: An Open-Source PyTorch Reconstruction of Claude Mythos Where 770M Parameters Match a 1.3B Transformer

OpenMythos is an open-source PyTorch project by Kye Gomez that proposes a theoretical reconstruction of Anthropic's Claude Mythos architecture. Instead of standard transformer layers, it suggests a Recurrent-Depth Transformer (RDT) design where weights loop through multiple iterations to increase reasoning depth during inference. By combining Mixture-of-Experts with Multi-Latent Attention and stability constraints, the model achieves performance parity between 770M parameters and a 1.3B parameter standard transformer.

* open-source PyTorch reconstruction of claude mythos
* proposes recurrent-depth transformer architecture
* reasoning depth scales via inference-time loops rather than parameter count
* uses mixture-of-experts for domain breadth
* implements multi-latent attention to reduce memory usage
* employs lti injection and adaptive computation time for stability
* achieves 1.3b parameter performance with only 770m parameters

2026-04-26 Tags: open mythos, recurrent-depth transformers, mixture-of-experts, multi-latent attention, continuous latent space reasoning, asif razzaq, deep learning, kyegomez by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: kyegomez*

Linked Tags

Related Tags