SemanticScuttle - klotz.me » klotz: deep learning+llm+training+grokking+optimization techniques

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

This paper presents a method to accelerate the grokking phenomenon, where a model's generalization improves with more training iterations after an initial overfitting stage. The authors propose a simple algorithmic modification to existing optimizers that filters out the fast-varying components of the gradients and amplifies the slow-varying components, thereby accelerating the grokking effect.

2024-08-19 Tags: grokking, deep learning, optimization techniques, gradient filtering, llm, training, eric hartford by klotz

SemanticScuttle - klotz.me

klotz: deep learning* + llm* + training* + grokking* + optimization techniques*

Linked Tags

Related Tags