SemanticScuttle - klotz.me » Tags: rtx 3090

Tags: rtx 3090*

0 bookmark(s) - Sort by: Date ↓ / Title /

This is GPT-OSS 120b on Ollama, running on a i7

A user shares their experience running the GPT-OSS 120b model on Ollama with an i7 6700, 64GB DDR4 RAM, RTX 3090, and a 1TB SSD. They note slow initial token generation but acceptable performance overall, highlighting it's possible on a relatively modest setup. The discussion includes comparisons to other hardware configurations, optimization techniques (llama.cpp), and the model's quality.

>I have a 3090 with 64gb ddr4 3200 RAM and am getting around 50 t/s prompt processing speed and 15 t/s generation speed using the following:
>
>`llama-server -m <path to gpt-oss-120b> --ctx-size 32768 --temp 1.0 --top-p 1.0 --jinja -ub 2048 -b 2048 -ngl 99 -fa 'on' --n-cpu-moe 24`
> This about fills up my VRAM and RAM almost entirely. For more wiggle room for other applications use `--n-cpu-moe 26`.

2025-09-01 Tags: gpt-oss, 120b, reddit, localllama, llm, inference, rtx 3090, llama.cpp, hardware by klotz

3 ways to turn an old GPU into your own eGPU

Learn how to use a spare GPU to create an external graphics card (eGPU) for your laptop or PC gaming handheld, including using prebuilt enclosures, DIY Thunderbolt enclosures, or OCuLink enclosures.

2025-01-17 Tags: gpu, egpu, thunderbolt, oculink, rtx 3090, hardware by klotz

The GPU Cloud Built for AI

Backprop provides powerful and affordable GPU instances for AI development, with pre-built environments, pay-as-you-go pricing, and fast internet.

2024-08-24 Tags: gpu, cloud, haas, saas, ai, rtx 3090, a100, backprop.co by klotz

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

A startup called Backprop has demonstrated that a single Nvidia RTX 3090 GPU, released in 2020, can handle serving a modest large language model (LLM) like Llama 3.1 8B to over 100 concurrent users with acceptable throughput. This suggests that expensive enterprise GPUs may not be necessary for scaling LLMs to a few thousand users.