Tags: vram*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. A user is experiencing slow performance with Qwen3-Coder-Next on their local system despite having a capable setup. They are using a tensor-split configuration with two GPUs (RTX 5060 Ti and RTX 3060) and are seeing speeds between 2-15 tokens/second, with high swap usage. The post details their hardware, parameters used, and seeks advice on troubleshooting the issue.
  2. The RTX 3090 offers a compelling combination of performance and 24GB of VRAM, making it a better choice for local LLM and AI workloads than newer Nvidia Blackwell GPUs like the RTX 5070 and even the RTX 5080, due to VRAM limitations and pricing.
    2026-02-07 Tags: , , , , , , , , , by klotz
  3. A ruby script calculates VRAM requirements for large language models (LLMs) based on model, bits per weight, and context length. It can determine required VRAM, maximum context length, or best bpw given available VRAM.
  4. A space on Hugging Face showcasing the LLM-Model-VRAM-Calculator, a tool designed to calculate the required VRAM for a specific machine learning model.
    2024-06-04 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "vram"

About - Propulsed by SemanticScuttle