klotz: diffusion*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Steerling-8B is an interpretable causal diffusion language model that combines masked diffusion language modeling with concept decomposition, enabling generation, attribution, steering, and extraction of hidden representations. It offers features like block-causal attention and decomposition of hidden states into known and unknown concepts.
  2. This article details research into finding the optimal architecture for small language models (70M parameters), exploring depth-width tradeoffs, comparing different architectures, and introducing Dhara-70M, a diffusion model offering 3.8x faster throughput with improved factuality.
  3. Mercury dLLMs are up to 10x faster and cheaper than current LLMs, offering high-quality text generation with improved reasoning and error correction.
  4. 2024-02-01 Tags: , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: diffusion

About - Propulsed by SemanticScuttle