SemanticScuttle - klotz.me

klotz: sft*

The article introduces a new approach to language modeling called test-time scaling, which enhances performance by utilizing additional compute resources during testing. The authors present a method involving a curated dataset and a technique called budget forcing to control compute usage, allowing models to double-check answers and improve reasoning. The approach is demonstrated with the Qwen2.5-32B-Instruct language model, showing significant improvements on competition math questions.

2025-02-14 Tags: arxiv, test-time scaling, budget forcing, llm, qwen2.5-32b-instruct, sft, fine tuning, reinforcement learning, machine learning, deepseek-r1 by klotz

NousResearch Released Nous-Hermes-2-Mixtral-8x7B: An Open-Source LLM with SFT and DPO Versions

Trained on a vast dataset comprising primarily GPT-4 generated data and supplemented with high-quality information from open datasets in the AI field, this model exhibits exceptional performance across various tasks. It introduces a novel SFT + DPO version, and for those who prefer a different approach, an SFT-only version is also made available

2024-01-26 Tags: llm, nous, dpo, sft by klotz

NVIDIA AI Introduces ChatQA: A Family of Conversational Question Answering (QA) Models that Obtain GPT-4 Level Accuracies

ChatQA, a new family of conversational question-answering (QA) models developed by NVIDIA AI. These models employ a unique two-stage instruction tuning method that significantly improves zero-shot conversational QA results from large language models (LLMs). The ChatQA-70B variant has demonstrated superior performance compared to GPT-4 across multiple conversational QA datasets.

2024-01-24 Tags: llm, instruction tuning, nvidia, chatqa, sft by klotz

Supervised Fine Tuning

2024-01-17 Tags: sft, llm, fine tuning by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: sft*

Linked Tags

Related Tags