SemanticScuttle - klotz.me » Tags: python+huggingface+llm+gpt-oss

Meet oLLM: A Lightweight Python Library that brings 100K-Context LLM Inference to 8 GB Consumer GPUs via SSD Offload—No Quantization Required

oLLM is a Python library for running large-context Transformers on NVIDIA GPUs by offloading weights and KV-cache to SSDs. It supports models like Llama-3, GPT-OSS-20B, and Qwen3-Next-80B, enabling up to 100K tokens of context on 8-10 GB GPUs without quantization.

2025-09-30 Tags: ollm, llm, inference, python, huggingface, pytorch, llama-3, gpt-oss, qwen3-next by klotz

SemanticScuttle - klotz.me

Tags: python* + huggingface* + llm* + gpt-oss*

Linked Tags

Related Tags