Tags: python* + huggingface* + llm* + gpt-oss*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. oLLM is a Python library for running large-context Transformers on NVIDIA GPUs by offloading weights and KV-cache to SSDs. It supports models like Llama-3, GPT-OSS-20B, and Qwen3-Next-80B, enabling up to 100K tokens of context on 8-10 GB GPUs without quantization.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "python+huggingface+llm+gpt-oss"

About - Propulsed by SemanticScuttle