A specialized implementation of a 25,000-parameter decoder-only transformer designed to run on an unmodified Commodore 64. Written in hand-coded 6502 assembly, the model features real multi-head causal self-attention, RMSNorm, and softmax, achieving functionality similar to modern LLM architectures despite the extreme hardware constraints of a 1 MHz processor.
Key technical details include:
- Uses int8 quantized parameters with per-tensor shift scaling.
- Implements fixed-point arithmetic (Q8.8) for activations.
- Features a 128-token BPE vocabulary and a 20-token context window.
- Includes tools for quantization-aware training (QAT) to ensure model accuracy on integer hardware.
- Capable of running on real C64 hardware or emulators like VICE, with performance averaging 60 seconds per token.