klotz: explainability*

Bookmarks on this page are managed by an admin user.

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article explains the concept and use of Friedman's H-statistic for finding interactions in machine learning models.

    - The H-stat is a non-parametric method that works well with ordinal variables, and it's useful when the interaction is not linear.
    - The H-stat compares the average rank of the response variable for each level of the predictor variable, considering all possible pairs of levels.
    - The H-stat calculates the sum of these rank differences and normalizes it by the total number of observations and the number of levels in the predictor variable.
    - The lower the H-stat, the stronger the interaction effect.
    - The article provides a step-by-step process for calculating the H-stat, using an example with a hypothetical dataset about the effects of asbestos exposure on lung cancer for smokers and non-smokers.
    - The author also discusses the assumptions of the H-stat and its limitations, such as the need for balanced data and the inability to detect interactions between more than two variables.
  2. Additive Decision Trees are a variation of standard decision trees, constructed in a way that can often allow them to be more accurate, more interpretable, or both. This article explains the intuition behind Additive Decision Trees and how they can be constructed.
  3. This paper explores whether some language model representations may be inherently multi-dimensional, contrasting the linear representation hypothesis. The authors develop a method using sparse autoencoders to find multi-dimensional features in GPT-2 and Mistral 7B. They find interpretable examples such as circular features representing days of the week and months of the year, which are used to solve computational problems involving modular arithmetic.
  4. "scaling sparse autoencoders has been a major priority of the Anthropic interpretability team, and we're pleased to report extracting high-quality features from Claude 3 Sonnet, 1 Anthropic's medium-sized production model.

    We find a diversity of highly abstract features. They both respond to and behaviorally cause abstract behaviors. Examples of features we find include features for famous people, features for countries and cities, and features tracking type signatures in code. Many features are multilingual (responding to the same concept across languages) and multimodal (responding to the same concept in both text and images), as well as encompassing both abstract and concrete instantiations of the same idea (such as code with security vulnerabilities, and abstract discussion of security vulnerabilities)."
  5. Generating counterfactual explanations got a lot easier with CFNOW, but what are counterfactual explanations, and how can I use them?

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: explainability

About - Propulsed by SemanticScuttle