SemanticScuttle - klotz.me » klotz: monosemanticity+polysemanticity+anthropic

klotz: monosemanticity* + polysemanticity* + anthropic*

Scaling Monosemanticity: Anthropic’s One Step Towards Interpretable & Manipulable LLMs

An article discussing the concept of monosemanticity in LLMs (Language Learning Models) and how Anthropic is working on making them more controllable and safer through prompt and activation engineering.

2024-05-29 Tags: llm, neural networks, monosemanticity, polysemanticity, prompt engineering, anthropic by klotz

First / Previous / Next / Last / Page 1 of 0