klotz: gguf* + localllama*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A user is seeking advice on deploying a new server with 4x H100 GPUs (320GB VRAM) for on-premise AI workloads. They are considering a Kubernetes-based deployment with RKE2, Nvidia GPU Operator, and tools like vLLM, llama.cpp, and Litellm. They are also exploring the option of GPU pass-through with a hypervisor. The post details their current infrastructure and asks for potential gotchas or best practices.
  2. Ollama now supports HuggingFace GGUF models, making it easier for users to run AI models locally without internet. The GGUF format allows for the use of AI models on modest-sized consumer hardware.
    2024-10-24 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: gguf + localllama

About - Propulsed by SemanticScuttle