DeepScientist is a goal-oriented, fully autonomous scientific discovery system. It uses Bayesian Optimization and a hierarchical 'hypothesize, verify, and analyze' process with a Findings Memory to balance exploration and exploitation. It generated and validated thousands of scientific ideas, surpassing human SOTA on three AI tasks.
This paper investigates how large language models (LLMs) solve mental math problems. It proposes that meaningful computation occurs late in the network (in terms of layer depth) and primarily at the last token, receiving information from other tokens in specific middle layers. The authors introduce techniques (CAMA and ABP) to identify an 'All-for-One' subgraph responsible for this behavior, demonstrating its sufficiency and necessity for high performance across various models and input styles.