SemanticScuttle - klotz.me

klotz: mxfp4*

This article details benchmarks for Unsloth Dynamic GGUFs of the Qwen3.5 model, including analysis of perplexity, KL divergence, and MXFP4. It covers performance across different bit widths and quant types, highlighting the impact of Imatrix and the limitations of certain quantization approaches. Full benchmark data is also provided.

2026-03-01 Tags: qwen3.5, gguf, benchmarks, quantization, perplexity, kl divergence, mxfp4, imatrix, llm, inference, dynamic quantization, unsloth by klotz

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

This blog post details a fine-tuning workflow for the gpt-oss model that recovers post-training accuracy while retaining the performance benefits of FP4. It involves supervised fine-tuning (SFT) on an upcasted BF16 version of the model, followed by quantization-aware training (QAT) using NVIDIA TensorRT Model Optimizer. The article also discusses the benefits of using NVFP4 for even better convergence and accuracy recovery.

2025-08-30 Tags: gpt-oss, fine-tuning, quantization-aware training, qat, tensorrt model optimizer, mxfp4, nvfp4, bf16, fp4, llm, nvidia by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: mxfp4*

Linked Tags

Related Tags