SemanticScuttle - klotz.me » klotz: moe+gpt-oss

klotz: moe* + gpt-oss*

Inside GPT-OSS: OpenAI’s Latest LLM Architecture

An in-depth look at the architecture of OpenAI's GPT-OSS models, detailing tokenization, embeddings, transformer blocks, Mixture of Experts, attention mechanisms (GQA and RoPE), and quantization techniques.

2025-09-27 Tags: llm, gpt-oss, openai, transformer, mixture of experts, moe, attention, gqa, rope, quantization, machine learning, .qwen3–30b-a3b. by klotz
120B runs awesome on just 8GB VRAM!

A user demonstrates how to run a 120B model efficiently on hardware with only 8GB VRAM by offloading MOE layers to CPU and keeping only attention layers on GPU, achieving high performance with minimal VRAM usage.

2025-08-21 Tags: 120b, moe, llama.cpp, gpt-oss, localllama, gpt-oss-120b, openai, llm by klotz

First / Previous / Next / Last / Page 1 of 0