Tags: mixture of experts* + quantization*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. An in-depth look at the architecture of OpenAI's GPT-OSS models, detailing tokenization, embeddings, transformer blocks, Mixture of Experts, attention mechanisms (GQA and RoPE), and quantization techniques.
  2. This article details 7 lessons the author learned while self-hosting Large Language Models (LLMs), covering topics like the importance of memory bandwidth, quantization, electricity costs, hardware choices beyond Nvidia, prompt engineering, Mixture of Experts models, and starting with simpler tools like LM Studio.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "mixture of experts+quantization"

About - Propulsed by SemanticScuttle