SemanticScuttle - klotz.me » Tags: mixture of experts+quantization

Tags: mixture of experts* + quantization*

0 bookmark(s) - Sort by: Date ↓ / Title /

Inside GPT-OSS: OpenAI’s Latest LLM Architecture

An in-depth look at the architecture of OpenAI's GPT-OSS models, detailing tokenization, embeddings, transformer blocks, Mixture of Experts, attention mechanisms (GQA and RoPE), and quantization techniques.

2025-09-27 Tags: llm, gpt-oss, openai, transformer, mixture of experts, moe, attention, gqa, rope, quantization, machine learning, .qwen3–30b-a3b. by klotz
7 things I wish I knew when I started self-hosting LLMs

This article details 7 lessons the author learned while self-hosting Large Language Models (LLMs), covering topics like the importance of memory bandwidth, quantization, electricity costs, hardware choices beyond Nvidia, prompt engineering, Mixture of Experts models, and starting with simpler tools like LM Studio.

2025-07-23 Tags: llm, self-hosting, gpu, quantization, memory bandwidth, ollama, lm studio, mixture of experts by klotz

First / Previous / Next / Last / Page 1 of 0