SuperCoder is a coding agent that runs in your terminal, offering features like code search, project structure exploration, code editing, bug fixing, and integration with OpenAI or local models.
Comparison of 17 LLMs against 4 typed programming languages to determine which excels at what. The article discusses the impact of LLMs on software development and the varying quality of LLM's output in different programming languages.
- Demonstrates how to improve two pretrained models' proficiency in the Dafny verified programming language.
- Uses 178 programming problems from the MBPP dataset for prompting GPT-4 and PaLM-2 to generate methods in Dafny.
- Three types of prompts were used: a direct contextless prompt, one that includes a signature of the method and test cases, and a third one that decomposes the problem into steps and includes dynamically chosen similar examples.
- GPT-4 was able to generate verified (and human-evaluated) Dafny methods in 58% of the cases with the third prompt.
- Contributes a collection of 153 MBPP problems implemented and formally verified in Dafny, 50 written by authors and 103 synthesized by GPT-4.