SemanticScuttle - klotz.me » klotz: attention+transformer

klotz: attention* + transformer*

Bookmarks on this page are managed by an admin user.

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention This bookmark is certified by an admin user.

This paper introduces Cross-Layer Attention (CLA), an extension of Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) for reducing the size of the key-value cache in transformer-based autoregressive large language models (LLMs). The authors demonstrate that CLA can reduce the cache size by another 2x while maintaining nearly the same accuracy as unmodified MQA, enabling inference with longer sequence lengths and larger batch sizes.

2024-05-26 Tags: transformer, autoregressive language models, key-value cache, attention, multiquery attention, cross-layer attention, machine learning, computer science, llm, mit, csail by klotz

Surpassing Trillion Parameters and GPT-3 with Switch Transformers – a path to AGI? - KDnuggets This bookmark is certified by an admin user.

Combined with the growing trend of multimodality, or models that combine language, image, and other types of capabilities, we may see a trend of AI models operating more like a committee of different components rather than a monolithic block. This approach actually has many conceptual similarities to a set of interesting ideas described by Marvin Minsky and Seymour Paypert from the early days of AI.

2021-10-03 Tags: deep learning, gpt-3, transformer, switched, attention, nlp, ai, marvin minsky, society of mind by klotz

Paper Dissected: "Attention is All You Need" Explained | Machine Learning Explained This bookmark is certified by an admin user.

2019-03-21 Tags: deep learning, architecture, attention, lstm, google, transformer, encoder, decoder by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: attention* + transformer*

Linked Tags

Related Tags