This paper challenges the traditional "singularity" concept of a single, all-powerful AI, proposing instead that the next intelligence explosion will be plural, social, and deeply intertwined with human intelligence. The authors highlight recent advances in agentic AI, demonstrating that intelligence fundamentally involves the interaction of diverse perspectives and emerges from social organization. They present evidence of "societies of thought" within reasoning models, where internal debates and multi-agent interactions enhance accuracy. The paper draws parallels to previous intelligence explosions, emphasizing the importance of scaling not just computational power, but also the social infrastructure—institutions, norms, and protocols—that govern these systems.
This post discusses a study that finds that refusal behavior in language models is mediated by a single direction in the residual stream of the model. The study presents an intervention that bypasses refusal by ablating this direction, and shows that adding in this direction induces refusal. The study is part of a scholars program and provides more details in a forthcoming paper.