SemanticScuttle - klotz.me » klotz: quantization-aware training

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

This blog post details a fine-tuning workflow for the gpt-oss model that recovers post-training accuracy while retaining the performance benefits of FP4. It involves supervised fine-tuning (SFT) on an upcasted BF16 version of the model, followed by quantization-aware training (QAT) using NVIDIA TensorRT Model Optimizer. The article also discusses the benefits of using NVFP4 for even better convergence and accuracy recovery.

2025-08-30 Tags: gpt-oss, fine-tuning, quantization-aware training, qat, tensorrt model optimizer, mxfp4, nvfp4, bf16, fp4, llm, nvidia by klotz

SemanticScuttle - klotz.me

klotz: quantization-aware training*

Linked Tags

Related Tags