AI safety and alignment research has predominantly been focused on methods for safeguarding individual AI systems, resting on the assumption of an eventual emergence of a monolithic Artificial General Intelligence (AGI). The alternative AGI emergence hypothesis, where general capability levels are first manifested through coordination in groups of sub-AGI individual agents with complementary skills and affordances, has received far less attention. Here we argue that this patchwork AGI hypothesis needs to be given serious consideration, and should inform the development of corresponding safeguards and mitigations.
Abstract:
>"The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI."
Sam Altman discusses the imminent arrival of digital superintelligence, its potential impacts on society, and the future of technological progress. He highlights the rapid advancements in AI, the economic and scientific benefits, and the challenges of ensuring safety and equitable access.
Symbotic Inc. acquires Veo Robotics Inc. for $8.7M, gaining its FreeMove technology that enhances robot safety and productivity. Key Veo executives join Symbotic.
This post discusses a study that finds that refusal behavior in language models is mediated by a single direction in the residual stream of the model. The study presents an intervention that bypasses refusal by ablating this direction, and shows that adding in this direction induces refusal. The study is part of a scholars program and provides more details in a forthcoming paper.