SemanticScuttle - klotz.me

Tags: vram*

0 bookmark(s) - Sort by: Date ↓ / Title /

Trouble getting Qwen3-Coder-Next running

A user is experiencing slow performance with Qwen3-Coder-Next on their local system despite having a capable setup. They are using a tensor-split configuration with two GPUs (RTX 5060 Ti and RTX 3060) and are seeing speeds between 2-15 tokens/second, with high swap usage. The post details their hardware, parameters used, and seeks advice on troubleshooting the issue.

2026-02-10 Tags: qwen3-coder-next, localllama, llm, gpu, rtx 5060 ti, rtx 3060, llama.cpp, docker, performance, tokens_second, tensor-split, vram, swap by klotz

This 5-year-old GPU handles local LLMs better than the newest from Nvidia

The RTX 3090 offers a compelling combination of performance and 24GB of VRAM, making it a better choice for local LLM and AI workloads than newer Nvidia Blackwell GPUs like the RTX 5070 and even the RTX 5080, due to VRAM limitations and pricing.

2026-02-07 Tags: gpu, nvidia, rtx 3090, rtx 5090, llm, ai, vram, local ai, gaming, hardware by klotz

[Script] Calculate VRAM requirements for LLM models

A ruby script calculates VRAM requirements for large language models (LLMs) based on model, bits per weight, and context length. It can determine required VRAM, maximum context length, or best bpw given available VRAM.

2024-08-01 Tags: llm, vram, script, quantization, model size, context length, cli, linux, ruby reddit, foss by klotz

LLM-Model-VRAM-Calculator

A space on Hugging Face showcasing the LLM-Model-VRAM-Calculator, a tool designed to calculate the required VRAM for a specific machine learning model.

2024-06-04 Tags: hugging face, llm, vram, calculator by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: vram*

Linked Tags

Related Tags