klotz: training* + llm* + deep learning*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This paper presents a method to accelerate the grokking phenomenon, where a model's generalization improves with more training iterations after an initial overfitting stage. The authors propose a simple algorithmic modification to existing optimizers that filters out the fast-varying components of the gradients and amplifies the slow-varying components, thereby accelerating the grokking effect.
  2. This article discusses the process of training a large language model (LLM) using reinforcement learning from human feedback (RLHF) and a new alternative method called Direct Preference Optimization (DPO). The article explains how these methods help align the LLM with human expectations and make it more efficient.
  3. Delving into transformer networks

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: training + llm + deep learning

About - Propulsed by SemanticScuttle