This article details a test of five local AI coding models โ Qwen3 Coder Next, Qwen3.5-122B-A10B, Devstral 2 123B, gpt-oss-120b, and Omnicoder-9B โ using a specific prompt to build a CLI static site generator in Python. The author found a significant performance gap, with Qwen3 Coder Next consistently outperforming the others, especially when utilizing Context7 for live documentation access. The test highlights the importance of accessing documentation to overcome biases in training data and the challenges local models face in consistently leveraging these tools. The article also points out common mistakes made by all models due to training data biases.