This article details research into finding the optimal architecture for small language models (70M parameters), exploring depth-width tradeoffs, comparing different architectures, and introducing Dhara-70M, a diffusion model offering 3.8x faster throughput with improved factuality.
Mercury dLLMs are up to 10x faster and cheaper than current LLMs, offering high-quality text generation with improved reasoning and error correction.