A new study reveals that while current AI models excel at solving math *problems*, they struggle with the *reasoning* required for mathematical *proofs*, demonstrating a gap between pattern recognition and genuine mathematical understanding.
This paper explores the cultural evolution of cooperation among LLM agents through a variant of the Donor Game, finding significant differences in cooperative behavior across various base models and initial strategies.
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
How computationally optimized prompts make language models excel, and how this all affects prompt engineering
A detailed analysis of the DeepMind/Meta study: how large language models achieve unprecedented compression rates on text, image, and audio data - and the implications of these results