SemanticScuttle - klotz.me » klotz: interpretability

klotz: interpretability*

Mapping the latent space of Llama 3.3 70B

Sparse autoencoders (SAEs) have been trained on Llama 3.3 70B, releasing an interpreted model accessible via API, enabling research and product development through feature space exploration and steering.

2024-12-25 Tags: llm, llama 3.3, sparse autoencoders, sae, latent space, features, xai, api, interpretability by klotz

Explaining Machine Learning Models: A Non-Technical Guide to Interpreting SHAP Analyses

This article provides a non-technical guide to interpreting SHAP analyses, useful for explaining machine learning models to non-technical stakeholders, with a focus on both local and global interpretability using various visualization methods.

2024-11-25 Tags: shap, machine learning, interpretability, data science, xai by klotz

Perform outlier detection more effectively using subsets of features

The article discusses techniques to improve outlier detection in tabular data by using subsets of features, known as subspaces, which can reduce the curse of dimensionality, increase interpretability, and allow for more efficient execution and tuning over time.

2024-11-25 Tags: outlier detection, subspace, dimensionality, feature subset, interpretability, pyod, data science by klotz

Gemma Scope | NeuronPEDIA

Gemma Scope is an open-source, multi-scale, high-throughput microscope system that combines brightfield, fluorescence, and confocal microscopy, designed for imaging large samples like brain tissue.

2024-08-02 Tags: gemma scope, gemma, llm, neuropedias, interpretability, xai, deep learning by klotz

Gemma Scope: helping the safety community shed light on the inner workings of language models

DeepMind's Gemma Scope provides researchers with tools to better understand how Gemma 2 language models work through a collection of sparse autoencoders. This helps in understanding the inner workings of these models and addressing concerns like hallucinations and potential manipulation.

2024-11-14 Tags: llm, interpretability, gemma scope, autoencoder, deepmind, visualization, xai, analysis by klotz

Refusal in LLMs is mediated by a single direction

This post discusses a study that finds that refusal behavior in language models is mediated by a single direction in the residual stream of the model. The study presents an intervention that bypasses refusal by ablating this direction, and shows that adding in this direction induces refusal. The study is part of a scholars program and provides more details in a forthcoming paper.

2024-06-10 Tags: large language model, refusal, interpretability, ai alignment, safety, fine-tuning by klotz

The Meaning of Explainability for AI

An article discussing the importance of explainability in machine learning and the challenges posed by neural networks. It highlights the difficulties in understanding the decision-making process of complex models and the need for more transparency in AI development.

2024-06-04 Tags: explainability, machine learning, neural networks, xai, interpretability by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: interpretability*

Linked Tags

Related Tags