SemanticScuttle - klotz.me » Tags: python+llm+llama-3

Meet oLLM: A Lightweight Python Library that brings 100K-Context LLM Inference to 8 GB Consumer GPUs via SSD Offload—No Quantization Required

oLLM is a Python library for running large-context Transformers on NVIDIA GPUs by offloading weights and KV-cache to SSDs. It supports models like Llama-3, GPT-OSS-20B, and Qwen3-Next-80B, enabling up to 100K tokens of context on 8-10 GB GPUs without quantization.

2025-09-30 Tags: ollm, llm, inference, python, huggingface, pytorch, llama-3, gpt-oss, qwen3-next by klotz

Small Language Models: Using 3.8B Phi-3 and 8B Llama-3 Models on a PC and Raspberry Pi

This article discusses how to test small language models using 3.8B Phi-3 and 8B Llama-3 models on a PC and Raspberry Pi with LlamaCpp and ONNX. Written by Dmitrii Eliuseev.

2024-06-21 Tags: llm, phi-3, llama-3, llamacpp, onnx, python, iot, raspberry pi by klotz

About - Propulsed by SemanticScuttle

SemanticScuttle - klotz.me

Tags: python* + llm* + llama-3*

Linked Tags

Related Tags