SemanticScuttle - klotz.me » klotz: explainability

klotz: explainability*

How do neural networks learn? A mathematical formula explains how they detect relevant patterns

Researchers from the University of California San Diego have developed a mathematical formula that explains how neural networks learn and detect relevant patterns in data, providing insight into the mechanisms behind neural network learning and enabling improvements in machine learning efficiency.

2025-01-07 Tags: neural networks, machine learning, features, xai, explainability, llm by klotz

Explaining Anomalies with Isolation Forest and SHAP

This article explores the use of Isolation Forest for anomaly detection and how SHAP (KernelSHAP and TreeSHAP) can be applied to explain the anomalies detected, providing insights into which features contribute to anomaly scores.

2024-09-30 Tags: isolation forest, shap, anomaly detection, explainability, data science by klotz

The Meaning of Explainability for AI

An article discussing the importance of explainability in machine learning and the challenges posed by neural networks. It highlights the difficulties in understanding the decision-making process of complex models and the need for more transparency in AI development.

2024-06-04 Tags: explainability, machine learning, neural networks, xai, interpretability by klotz

Understanding Friedman’s H-statistic (H-stat) for Interactions

This article explains the concept and use of Friedman's H-statistic for finding interactions in machine learning models.

The H-stat is a non-parametric method that works well with ordinal variables, and it's useful when the interaction is not linear.
The H-stat compares the average rank of the response variable for each level of the predictor variable, considering all possible pairs of levels.
The H-stat calculates the sum of these rank differences and normalizes it by the total number of observations and the number of levels in the predictor variable.
The lower the H-stat, the stronger the interaction effect.
The article provides a step-by-step process for calculating the H-stat, using an example with a hypothetical dataset about the effects of asbestos exposure on lung cancer for smokers and non-smokers.
The author also discusses the assumptions of the H-stat and its limitations, such as the need for balanced data and the inability to detect interactions between more than two variables.

2024-05-29 Tags: friedman_s h-statistic, h-stat, machine learning, interactions, data science, explainability, xai by klotz

Additive Decision Trees: An Interpretable Classification and Regression Model

Additive Decision Trees are a variation of standard decision trees, constructed in a way that can often allow them to be more accurate, more interpretable, or both. This article explains the intuition behind Additive Decision Trees and how they can be constructed.

2024-05-24 Tags: decision tree, explainability, classification, regression, decision trees, machine learning by klotz

Not All Language Model Features Are Linear

This paper explores whether some language model representations may be inherently multi-dimensional, contrasting the linear representation hypothesis. The authors develop a method using sparse autoencoders to find multi-dimensional features in GPT-2 and Mistral 7B. They find interpretable examples such as circular features representing days of the week and months of the year, which are used to solve computational problems involving modular arithmetic.

2024-05-24 Tags: llm, explainability, multi-dimensional features, gpt-2, mistral 7b, circular features by klotz

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

"scaling sparse autoencoders has been a major priority of the Anthropic interpretability team, and we're pleased to report extracting high-quality features from Claude 3 Sonnet, 1 Anthropic's medium-sized production model.

We find a diversity of highly abstract features. They both respond to and behaviorally cause abstract behaviors. Examples of features we find include features for famous people, features for countries and cities, and features tracking type signatures in code. Many features are multilingual (responding to the same concept across languages) and multimodal (responding to the same concept in both text and images), as well as encompassing both abstract and concrete instantiations of the same idea (such as code with security vulnerabilities, and abstract discussion of security vulnerabilities)."

2024-05-24 Tags: explainability, llm, ontology, anthropic, claude 3 by klotz

AI Alignment Breakthroughs this week

2023-10-12 Tags: lesswrong, ai, alignment, llm, explainability, ontology by klotz

What If We Could Easily Explain Overly Complex Models?

Generating counterfactual explanations got a lot easier with CFNOW, but what are counterfactual explanations, and how can I use them?

2023-09-30 Tags: machine learning, explainability, feature selection, lime, shap, counterfactual, cfnow, xai by klotz

Decision Tree visualization

2023-06-23 Tags: decision tree, visualization, explainability, machine learning by klotz

First / Previous / Next / Last / Page 1 of 0