Tags: llama.cpp* + openai*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. A user demonstrates how to run a 120B model efficiently on hardware with only 8GB VRAM by offloading MOE layers to CPU and keeping only attention layers on GPU, achieving high performance with minimal VRAM usage.
  2. 2025-08-19 Tags: , , , , , , by klotz
  3. This page provides information about LLooM, a tool that uses raw LLM logits to weave threads in a probabilistic way. It includes instructions on how to use LLooM with various environments, such as vLLM, llama.cpp, and OpenAI. The README also explains the parameters and configurations for LLooM.
  4. llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
    2023-06-09 Tags: , , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "llama.cpp+openai"

About - Propulsed by SemanticScuttle