Gemma Scope is a research tool for analyzing and understanding the inner workings of the Gemma 2 generative AI models, allowing examination of individual AI model layers during request processing.
This paper explores the structure of the feature point cloud discovered by sparse autoencoders in large language models. It investigates three scales: atomic, brain, and galaxy. The atomic scale involves crystal structures with parallelograms or trapezoids, improved by projecting out distractor dimensions. The brain scale focuses on modular structures, similar to neural lobes. The galaxy scale examines the overall shape and clustering of the point cloud.
DeepMind's Gemma Scope provides researchers with tools to better understand how Gemma 2 language models work through a collection of sparse autoencoders. This helps in understanding the inner workings of these models and addressing concerns like hallucinations and potential manipulation.