SemanticScuttle - klotz.me » Tags: fine-tuning

Tags: fine-tuning*

0 bookmark(s) - Sort by: Date ↓ / Title /

This article details the performance of Unsloth Dynamic GGUFs on the Aider Polyglot benchmark, showcasing how it can quantize LLMs like DeepSeek-V3.1 to as low as 1-bit while outperforming models like GPT-4.5 and Claude-4-Opus. It also covers benchmark setup, comparisons to other quantization methods, and chat template bug fixes.

2025-10-13 Tags: unsloth, gguf, aider polyglot, llm, quantization, deepseek-v3.1, gpt-4, claude-4, model compression, fine-tuning, inference by klotz

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

This blog post details a fine-tuning workflow for the gpt-oss model that recovers post-training accuracy while retaining the performance benefits of FP4. It involves supervised fine-tuning (SFT) on an upcasted BF16 version of the model, followed by quantization-aware training (QAT) using NVIDIA TensorRT Model Optimizer. The article also discusses the benefits of using NVFP4 for even better convergence and accuracy recovery.

2025-08-30 Tags: gpt-oss, fine-tuning, quantization-aware training, qat, tensorrt model optimizer, mxfp4, nvfp4, bf16, fp4, llm, nvidia by klotz

Gemma 3: How to Run & Fine-tune

How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth! This page details running Gemma 3 on various platforms, including phones, and fine-tuning it using Unsloth, addressing potential issues with float16 precision and providing optimal configuration settings.

2025-08-16 Tags: gemma 3, llm, fine-tuning, llama.cpp, unsloth, gguf, gpu, colab, vision, audio, oobabooga by klotz

Devstral: How to Run & Fine-tune | Unsloth Documentation

Learn how to run and fine-tune Mistral Devstral 1.1, including Small-2507 and 2505. This guide covers official recommended settings, tutorials for running Devstral in Ollama and llama.cpp, experimental vision support, and fine-tuning with Unsloth.

2025-07-11 Tags: devstral, mistral, unsloth, fine-tuning, llm, ollama, llama.cpp, vision by klotz

Building LLM Workflows - - some observations

A post with pithy observations and clear conclusions from building complex LLM workflows, covering topics like prompt chaining, data structuring, model limitations, and fine-tuning strategies.

2025-05-09 Tags: llm, localllama, prompt engineering, fine-tuning, agentic loops, context window, bert, xml, cot, workflow, reddit by klotz

Tutorial: How to Run & Fine-tune Gemma 3

This document details how to run and fine-tune Gemma 3 models (1B, 4B, 12B, and 27B) using Unsloth, covering setup with Ollama and llama.cpp, and addressing potential float16 precision issues. It also highlights Unsloth's unique ability to run Gemma 3 in float16 on machines like Colab notebooks with Tesla T4 GPUs.

2025-04-09 Tags: gemma 3, unsloth, llama.cpp, ollama, fine-tuning, llm, inference by klotz

Training Large Language Models with Interpreter Feedback using WebAssembly

This article details a method for training large language models (LLMs) for code generation using a secure, local WebAssembly-based code interpreter and reinforcement learning with Group Relative Policy Optimization (GRPO). It covers the setup, training process, evaluation, and potential next steps.

2025-04-04 Tags: huggingface, llm, training, code generation, webassembly, wasm, grpo, reinforcement learning, axolotl, code interpreter, fine-tuning, python by klotz

Are You Still Using LoRA to Fine-Tune Your LLM?

A look at this year’s crop of LoRA alternatives, including SVF, SVFT, MiLoRA, PiSSA, and LoRA-XS, all based on SVD (Singular Value Decomposition). The article compares these techniques to the original LoRA method for fine-tuning Large Language Models.

| Method | Description | Key Feature(s) | Reference |
|--------------|---------------------------------------------|---------------------------------------------|-|
| LoRA | Freezes the model and trains a small pair of low-rank “adapter” matrices. | Saves memory and compute cycles by reducing the number of trainable parameters. | arxiv.org/abs/2106.09685 |
| SVF | Uses SVD on the model’s weight matrices and fine-tunes the singular values directly. | More economical in parameters than LoRA; makes tuned models composable. | arxiv.org/abs/2501.06252v2 |
| SVFT | Adds more trainable weights on the diagonal and evaluates various alternatives. | Provides more trainable values than just the diagonal, useful for better fine-tuning. | arxiv.org/abs/2405.19597 |
| PiSSA | Tunes only the large principal values. | Designed to approximate full fine-tuning by adapting the principal singular components. | arxiv.org/abs/2404.02948 |
| MiLoRA | Tunes only the small principal values. | Retains base model’s knowledge while adapting to new tasks. | arxiv.org/abs/2406.09044 |
| LoRA-XS | Similar to PiSSA but with a slightly different mechanism. | Shows good results with significantly fewer parameters than LoRA. | arxiv.org/abs/2405.17604 |
| DoRA | Splits weights into magnitudes and directions then tunes those. | | arxiv.org/abs/2402.09353 |
| AdaLoRA | Complex mechanism for finding the best tuning rank for a given budget of trainable weights. | | arxiv.org/abs/2303.10512 |

2025-03-14 Tags: lora, llm, fine-tuning by klotz

6 Common LLM Customization Strategies Briefly Explained

The article explains six essential strategies for customizing Large Language Models (LLMs) to better meet specific business needs or domain requirements. These strategies include Prompt Engineering, Decoding and Sampling Strategy, Retrieval Augmented Generation (RAG), Agent, Fine-Tuning, and Reinforcement Learning from Human Feedback (RLHF). Each strategy is described with its benefits, limitations, and implementation approaches to align LLMs with specific objectives.

2025-02-25 Tags: llm, prompt engineering, rag, agent, fine-tuning, rlhf by klotz

Tutorial to Fine-Tuning Mistral 7B with QLoRA Using Axolotl for Efficient LLM Training

This tutorial guides readers on how to fine-tune the Mistral 7B large language model using QLoRA with the Axolotl library, focusing on managing limited GPU resources for efficient training. It covers environment setup, dataset creation, configuration of QLoRA hyperparameters, the fine-tuning process, and testing the fine-tuned model.

2025-02-10 Tags: mistral 7b, qlora, axolotl, fine-tuning, llm, training, lora by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: fine-tuning*

Linked Tags

Related Tags