SemanticScuttle - klotz.me

Tags: unsloth*

0 bookmark(s) - Sort by: Date ↓ / Title /

This repository provides the GGUF quantized weights for Qwen3.6-27B, a flagship-level coding model designed for stability and real-world utility. The model features significant upgrades in agentic coding capabilities, allowing it to handle frontend workflows and repository-level reasoning with high precision. It also introduces thinking preservation, which enables the model to retain reasoning context from historical messages to improve iterative development.
Key technical highlights:
* Native context length of 262,144 tokens, extensible up to 1,010,000 via RoPE scaling (YaRN).
* Enhanced tool-calling capabilities for complex agentic tasks.
* Support for multimodal inputs including images and video.
* Optimized for various inference frameworks like SGLang, vLLM, and KTransformers.

2026-04-23 Tags: qwen3.6, gguf, unsloth, llm quantization, agentic coding, multimodal ai by klotz

Qwen3.6 GGUF Benchmarks

Unsloth AI presents performance benchmarks for Qwen3.6-35B-A3B GGUF quantizations, claiming state-of-the-art results in mean KL divergence across most model sizes. The discussion includes community analysis regarding SWE-bench Verified performance, where some users noted unexpected discrepancies between Qwen3.5 and Qwen3.6 quantization results during coding tasks.
Key points:
- Unsloth ranks first in 21 of 22 model sizes for mean KL divergence.
- Community debate over SWE-bench testing methodology and sample sizes.
- Reported performance variations between different quantization levels (Q4, Q5, Q6, Q8).
- Discussion on system prompt adherence and error rates in coding benchmarks.

2026-04-18 Tags: unsloth, qwen3.6, gguf, benchmarks, quantization, swe-bench, llm performance by klotz

Gemma 4

This document details how to run Google's Gemma 4 models locally, including the E2B, E4B, 26B-A4B, and 31B variants. Gemma 4 is a family of open models supporting over 140 languages and up to 256K context, available in both dense and MoE configurations. The E2B and E4B models support image and audio input. These models can be run locally on your device and fine-tuned using Unsloth Studio. The document outlines hardware requirements, recommended settings, and best practices for prompting and multimodal use, including guidance on context length and thinking mode.

2026-04-02 Tags: gemma 4, llm, local ai, unsloth, models, inference, fine-tuning, llama.cpp, multimodal by klotz

gemma-4-31B-it-GGUF

This Hugging Face page details the Gemma 4 31B-it model, an open-weights multimodal model created by Google DeepMind. Gemma 4 can process both text and image inputs, generating text outputs, with smaller models also supporting audio. It comes in various sizes (E2B, E4B, 26B A4B, and 31B) allowing for deployment on diverse hardware, from phones to servers.
The model boasts a context window of up to 256K tokens and supports over 140 languages. It utilizes dense and Mixture-of-Experts (MoE) architectures, excelling in tasks like text generation, coding, and reasoning. The page provides details on model data, training, ethics, usage, limitations, and best practices, along with code snippets for getting started with Transformers.

2026-04-02 Tags: google, huggingface, unsloth, llm, image-text-to-text, gguf, gemma4, gemma, imatrix, conversational by klotz

Qwen3.5 GGUF Benchmarks

This article details benchmarks for Unsloth Dynamic GGUFs of the Qwen3.5 model, including analysis of perplexity, KL divergence, and MXFP4. It covers performance across different bit widths and quant types, highlighting the impact of Imatrix and the limitations of certain quantization approaches. Full benchmark data is also provided.

2026-03-01 Tags: qwen3.5, gguf, benchmarks, quantization, perplexity, kl divergence, mxfp4, imatrix, llm, inference, dynamic quantization, unsloth by klotz

Tool Calling Guide for Local LLMs

This guide explains how to use tool calling with local LLMs, including examples with mathematical, story, Python code, and terminal functions, using llama.cpp, llama-server, and OpenAI endpoints.

2026-02-06 Tags: tool calling, llm, unsloth, llama.cpp, llama-server, openai, function calling, python, terminal, inference by klotz

Unsloth Dynamic GGUFs on Aider Polyglot

This article details the performance of Unsloth Dynamic GGUFs on the Aider Polyglot benchmark, showcasing how it can quantize LLMs like DeepSeek-V3.1 to as low as 1-bit while outperforming models like GPT-4.5 and Claude-4-Opus. It also covers benchmark setup, comparisons to other quantization methods, and chat template bug fixes.

2025-10-13 Tags: unsloth, gguf, aider polyglot, llm, quantization, deepseek-v3.1, gpt-4, claude-4, model compression, fine-tuning, inference by klotz

Gemma 3: How to Run & Fine-tune

How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth! This page details running Gemma 3 on various platforms, including phones, and fine-tuning it using Unsloth, addressing potential issues with float16 precision and providing optimal configuration settings.

2025-08-16 Tags: gemma 3, llm, fine-tuning, llama.cpp, unsloth, gguf, gpu, colab, vision, audio, oobabooga by klotz

Devstral: How to Run & Fine-tune | Unsloth Documentation

Learn how to run and fine-tune Mistral Devstral 1.1, including Small-2507 and 2505. This guide covers official recommended settings, tutorials for running Devstral in Ollama and llama.cpp, experimental vision support, and fine-tuning with Unsloth.

2025-07-11 Tags: devstral, mistral, unsloth, fine-tuning, llm, ollama, llama.cpp, vision by klotz

Tutorial: How to Run & Fine-tune Gemma 3

This document details how to run and fine-tune Gemma 3 models (1B, 4B, 12B, and 27B) using Unsloth, covering setup with Ollama and llama.cpp, and addressing potential float16 precision issues. It also highlights Unsloth's unique ability to run Gemma 3 in float16 on machines like Colab notebooks with Tesla T4 GPUs.

2025-04-09 Tags: gemma 3, unsloth, llama.cpp, ollama, fine-tuning, llm, inference by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: unsloth*

Linked Tags

Related Tags