SemanticScuttle - klotz.me » Tags: swe-bench

Tags: swe-bench*

0 bookmark(s) - Sort by: Date ↓ / Title /

Unsloth AI presents performance benchmarks for Qwen3.6-35B-A3B GGUF quantizations, claiming state-of-the-art results in mean KL divergence across most model sizes. The discussion includes community analysis regarding SWE-bench Verified performance, where some users noted unexpected discrepancies between Qwen3.5 and Qwen3.6 quantization results during coding tasks.
Key points:
- Unsloth ranks first in 21 of 22 model sizes for mean KL divergence.
- Community debate over SWE-bench testing methodology and sample sizes.
- Reported performance variations between different quantization levels (Q4, Q5, Q6, Q8).
- Discussion on system prompt adherence and error rates in coding benchmarks.

2026-04-18 Tags: unsloth, qwen3.6, gguf, benchmarks, quantization, swe-bench, llm performance by klotz

Qwen3-Coder-Next Technical Report

Qwen3-Coder-Next is an 80-billion-parameter language model that activates only 3 billion parameters during inference, achieving strong coding capabilities through agentic training with verifiable task synthesis and reinforcement learning. It is an open-weight model specialized for coding agents, and both base and instruction-tuned versions are released to support research and real-world coding agent development.

2026-03-06 Tags: language model, coding, agent, reinforcement learning, open-weight, qwen3-coder-next, swe-bench, terminal-bench by klotz

All Hands AI Open Sources OpenHands CodeAct 2.1: A New Software Development Agent to Solve Over 50% of Real GitHub Issues in SWE-Bench

All Hands AI has released OpenHands CodeAct 2.1, an open-source software development agent that can solve over 50% of real GitHub issues in SWE-Bench. The agent uses Anthropic’s Claude-3.5 model, function calling, and improved directory traversal to achieve this milestone.

2024-11-02 Tags: llm, github, swe-bench, openhands, agent, software by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: swe-bench*

Linked Tags

Related Tags