SemanticScuttle - klotz.me » klotz: interpretability+visualization

klotz: interpretability* + visualization*

Mechanistic Interpretability: Peeking Inside an LLM

This article explores the field of mechanistic interpretability, aiming to understand how large language models (LLMs) work internally by reverse-engineering their computations. It discusses techniques for identifying and analyzing the functions of individual neurons and circuits within these models, offering insights into their decision-making processes.

2026-02-06 Tags: llm, mechanistic interpretability, visualization, reverse engineering, neural networks, interpretability, machine learning by klotz

Gemma Scope: helping the safety community shed light on the inner workings of language models

DeepMind's Gemma Scope provides researchers with tools to better understand how Gemma 2 language models work through a collection of sparse autoencoders. This helps in understanding the inner workings of these models and addressing concerns like hallucinations and potential manipulation.

2024-11-14 Tags: llm, interpretability, gemma scope, autoencoder, deepmind, visualization, xai, analysis by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: interpretability* + visualization*

Linked Tags

Related Tags