The article details “autoresearch,” a project by Karpathy where an AI agent autonomously experiments with training a small language model (nanochat) to improve its performance. The agent modifies the `train.py` file, trains for a fixed 5-minute period, and evaluates the results, repeating this process to iteratively refine the model. The project aims to demonstrate autonomous AI research, focusing on a simplified, single-GPU setup with a clear metric (validation bits per byte).
* **Autonomous Research:** The core concept of AI-driven experimentation.
* **nanochat:** The small language model used for training.
* **Fixed Time Budget:** Each experiment runs for exactly 5 minutes.
* **program.md:** The file containing instructions for the AI agent.
* **Single-File Modification:** The agent only edits `train.py`.
This article details a method for training large language models (LLMs) for code generation using a secure, local WebAssembly-based code interpreter and reinforcement learning with Group Relative Policy Optimization (GRPO). It covers the setup, training process, evaluation, and potential next steps.