Tags: llama.cpp* + openai* + llm* + localllama*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. A user demonstrates how to run a 120B model efficiently on hardware with only 8GB VRAM by offloading MOE layers to CPU and keeping only attention layers on GPU, achieving high performance with minimal VRAM usage.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "llama.cpp+openai+llm+localllama"

About - Propulsed by SemanticScuttle