SemanticScuttle - klotz.me » Tags: llama.cpp+openai

Tags: llama.cpp* + openai*

0 bookmark(s) - Sort by: Date ↓ / Title /

120B runs awesome on just 8GB VRAM!

A user demonstrates how to run a 120B model efficiently on hardware with only 8GB VRAM by offloading MOE layers to CPU and keeping only attention layers on GPU, achieving high performance with minimal VRAM usage.

2025-08-21 Tags: 120b, moe, llama.cpp, gpt-oss, localllama, gpt-oss-120b, openai, llm by klotz
guide : running gpt-oss with llama.cpp

2025-08-19 Tags: gpt-oss, -20b, openai, github, llama.cpp, llm, ggml by klotz
LLooM: Leverage raw LLM logits to weave threads

This page provides information about LLooM, a tool that uses raw LLM logits to weave threads in a probabilistic way. It includes instructions on how to use LLooM with various environments, such as vLLM, llama.cpp, and OpenAI. The README also explains the parameters and configurations for LLooM.

2024-07-04 Tags: lloom, llm, logits, vllm, llama.cpp, openai, greedy decoding, beamsearch, github by klotz
abetlen/llama-cpp-python: Python bindings for llama.cpp

llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).

2023-06-09 Tags: openai, llama.cpp, llama, python, api, foss, github by klotz

First / Previous / Next / Last / Page 1 of 0