SemanticScuttle - klotz.me

All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

This paper investigates how large language models (LLMs) solve mental math problems. It proposes that meaningful computation occurs late in the network (in terms of layer depth) and primarily at the last token, receiving information from other tokens in specific middle layers. The authors introduce techniques (CAMA and ABP) to identify an 'All-for-One' subgraph responsible for this behavior, demonstrating its sufficiency and necessity for high performance across various models and input styles.

2025-09-13 Tags: llm, mental math, next-token prediction, attention, computation, language, cama, abp, datadog by klotz

SemanticScuttle - klotz.me

klotz: abp*

Linked Tags

Related Tags