This paper investigates how large language models (LLMs) solve mental math problems. It proposes that meaningful computation occurs late in the network (in terms of layer depth) and primarily at the last token, receiving information from other tokens in specific middle layers. The authors introduce techniques (CAMA and ABP) to identify an 'All-for-One' subgraph responsible for this behavior, demonstrating its sufficiency and necessity for high performance across various models and input styles.
This paper demonstrates that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence. It explores the use of the 'detached Jacobian' to interpret semantic concepts within LLMs and potentially steer next-token prediction.