SemanticScuttle - klotz.me » Tags: 6502 assembly

A specialized implementation of a 25,000-parameter decoder-only transformer designed to run on an unmodified Commodore 64. Written in hand-coded 6502 assembly, the model features real multi-head causal self-attention, RMSNorm, and softmax, achieving functionality similar to modern LLM architectures despite the extreme hardware constraints of a 1 MHz processor.
Key technical details include:
- Uses int8 quantized parameters with per-tensor shift scaling.
- Implements fixed-point arithmetic (Q8.8) for activations.
- Features a 128-token BPE vocabulary and a 20-token context window.
- Includes tools for quantization-aware training (QAT) to ensure model accuracy on integer hardware.
- Capable of running on real C64 hardware or emulators like VICE, with performance averaging 60 seconds per token.

2026-04-29 Tags: commodore 64, transformer, 6502 assembly, machine learning, quantization, retro computing, llm by klotz

SemanticScuttle - klotz.me

Tags: 6502 assembly*

Linked Tags

Related Tags