Meta releases Llama 3.1, its largest and best model yet, surpassing GPT-4o on several benchmarks. Zuckerberg believes this marks the 'Linux moment' in AI, opening the door for open-source models to flourish.
This tutorial provides a step-by-step guide on building an LLM router to balance the use of high-quality closed LLMs like GPT-4 and cost-effective open-source LLMs, achieving high response quality while minimizing costs. The approach includes preparing labeled data, finetuning a causal LLM classifier, and offline evaluation using the RouteLLM framework.
OpenAI introduces GPT-4, a new large language model that surpasses human performance on various tasks. Although not yet publicly available, the article provides insights into its capabilities and how it sets a new standard for AI.
Researchers from NYU Tandon School of Engineering investigated whether modern natural language processing systems could solve the daily Connections puzzles from The New York Times. The results showed that while all the AI systems could solve some of the puzzles, they struggled overall.
This tutorial introduces promptrefiner, a tool created by Amirarsalan Rajabi that uses the GPT-4 model to create perfect system prompts for local LLMs.
- Demonstrates how to improve two pretrained models' proficiency in the Dafny verified programming language.
- Uses 178 programming problems from the MBPP dataset for prompting GPT-4 and PaLM-2 to generate methods in Dafny.
- Three types of prompts were used: a direct contextless prompt, one that includes a signature of the method and test cases, and a third one that decomposes the problem into steps and includes dynamically chosen similar examples.
- GPT-4 was able to generate verified (and human-evaluated) Dafny methods in 58% of the cases with the third prompt.
- Contributes a collection of 153 MBPP problems implemented and formally verified in Dafny, 50 written by authors and 103 synthesized by GPT-4.
. The author experiments with the model, asking it to add a walrus to a prompt, and is surprised to find that the model can maintain consistency between images with a slightly altered prompt using a "seed" number. The author also delves into the underlying prompt engineering of DALL-E 3, revealing policies and guidelines that govern the model's image generation, including diversity and inclusivity guidelines.