SemanticScuttle - klotz.me » klotz: features+anthropic

klotz: features* + anthropic*

Anthropic decoded the vectors Claude uses to represent abstract concepts

Last week, Anthropic announced a significant breakthrough in our understanding of how large language models work. The research focused on Claude 3 Sonnet, the mid-sized version of Anthropic’s latest frontier model. Anthropic showed that it could transform Claude's otherwise inscrutable numeric representation of words into a combination of ‘features’, many of which can be understood by human beings. The vectors Claude uses to represent words can be understood as the sum of ‘features’—vectors that represent a variety of abstract concepts from immunology to coding errors to the Golden Gate Bridge. This research could prove useful for Anthropic and the broader industry, potentially leading to new tools to detect model misbehavior or prevent it altogether.

2024-06-06 Tags: anthropic, claude, large language model, vectors, features, abstract concepts, ontology by klotz

Mapping the Mind of a Large Language Model May 21, 2024

"...a feature that activates when Claude reads a scam email (this presumably supports the model’s ability to recognize such emails and warn you not to respond to them). Normally, if one asks Claude to generate a scam email, it will refuse to do so. But when we ask the same question with the feature artificially activated sufficiently strongly, this overcomes Claude's harmlessness training and it responds by drafting a scam email."

2024-05-21 Tags: claude, anthropic, llm, ontology, features, semantic web, spam, email by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: features* + anthropic*

Linked Tags

Related Tags