The Allen Institute for AI has released the Tulu 2.5 suite, a collection of advanced AI models trained using Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO). The suite includes a variety of models trained on various datasets to enhance their reward and value models. This release aims to significantly improve language model performance across several domains.
This article discusses the latest open LLM (large language model) releases, including Mixtral 8x22B, Meta AI's Llama 3, and Microsoft's Phi-3, and compares their performance on the MMLU benchmark. It also talks about Apple's OpenELM and its efficient language model family with an open-source training and inference framework. The article also explores the use of PPO and DPO algorithms for instruction finetuning and alignment in LLMs.