SemanticScuttle - klotz.me » Tags: llama.cpp+openai+llm+localllama

Tags: llama.cpp* + openai* + llm* + localllama*

0 bookmark(s) - Sort by: Date ↓ / Title /

120B runs awesome on just 8GB VRAM!

A user demonstrates how to run a 120B model efficiently on hardware with only 8GB VRAM by offloading MOE layers to CPU and keeping only attention layers on GPU, achieving high performance with minimal VRAM usage.

2025-08-21 Tags: 120b, moe, llama.cpp, gpt-oss, localllama, gpt-oss-120b, openai, llm by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0

About - Propulsed by SemanticScuttle