This paper introduces Cross-Layer Attention (CLA), an extension of Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) for reducing the size of the key-value cache in transformer-based autoregressive large language models (LLMs). The authors demonstrate that CLA can reduce the cache size by another 2x while maintaining nearly the same accuracy as unmodified MQA, enabling inference with longer sequence lengths and larger batch sizes.
MIT CSAIL researchers have developed three neurosymbolic frameworks - LILO, Ada, and LGA - that use natural language to help large language models (LLMs) build better abstractions for coding, AI planning, and robotics tasks.
This article discusses the MIT Artificial Intelligence (AI) Lab's 'Tourist Policy' and how it impacted students' access to its resources. As a high school student in Maryland, the author shares their experience of using the lab's PDP-10s over the ARPANET and how it inspired them to learn and contribute to the MIT community.
REFERENCES
1. Abelson. H.. and disessa, A.A. Turtle Geomelry: The Computer as a
Medium for Exploring Mathematics. MIT Press, Cambridge, Mass.,
1981.
2. Bolt, R.A. Spatial data-management. Rep., Dept. of Architecture.
MIT. Cambridge. Mass.. 1979.
3. diSessa. A.A. A principled design for an integrated computational
environment. Hum.-Compui. Interaction I, 1 (1985). l-47.
4. disessa. A.A. Notes on the future of programming: Breaking the
utility barrier. In User-Centered Systems Design, D. Norman and
S. Draper. Eds. Lawrence Erlbaum. Hillsdale, N.J., 1986.
5. Goldberg. A.. and Robson. D. Smalltalk-80: The Language and Its Implementafion. Addison-Wesley. Reading. Mass., 1983.
6. Papert, S. Mindstorms:
Acknowledgments. We gratefully acknowledge the
efforts of all those members of the Boxer Groups at
MIT and Berkeley who have helped to make Boxer
(almost) a reality. Special thanks to Michael
Eisenberg, Gregor Kiczales, Leigh Klotz, Ed Lay,
and Jeremy Roschelle.