klotz: self-attention* + machine learning* + deep learning*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article provides a beginner-friendly explanation of attention mechanisms and transformer models, covering sequence-to-sequence modeling, the limitations of RNNs, the concept of attention, and how transformers address these limitations with self-attention and parallelization.
  2. The self-attention mechanism is used to capture interactions between words within input and output sequences. It involves computing keys, queries, and values vectors, followed by matrix multiplications and a softmax transformation to produce an attention matrix.
  3. Explore the intricacies of the attention mechanism responsible for fueling the transformers.
  4. A detailed explanation of the Transformer model, a key architecture in modern deep learning for tasks like neural machine translation, focusing on components like self-attention, encoder and decoder stacks, positional encoding, and training.
  5. This article is part of a series titled ‘LLMs from Scratch’, a complete guide to understanding and building Large Language Models (LLMs). In this article, we discuss the self-attention mechanism and how it is used by transformers to create rich and context-aware transformer embeddings.

    The Self-Attention mechanism is used to add context to learned embeddings, which are vectors representing each word in the input sequence. The process involves the following steps:

    1. Learned Embeddings: These are the initial vector representations of words, learned during the training phase. The weights matrix, storing the learned embeddings, is stored in the first linear layer of the Transformer architecture.

    2. Positional Encoding: This step adds positional information to the learned embeddings. Positional information helps the model understand the order of the words in the input sequence, as transformers process all words in parallel, and without this information, they would lose the order of the words.

    3. Self-Attention: The core of the Self-Attention mechanism is to update the learned embeddings with context from the surrounding words in the input sequence. This mechanism determines which words provide context to other words, and this contextual information is used to produce the final contextualized embeddings.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: self-attention + machine learning + deep learning

About - Propulsed by SemanticScuttle