klotz: rtx 4090*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A comprehensive technical guide on setting up a high-performance local large language model environment for agentic coding tasks. The author demonstrates how to run a quantized Qwen3.5-27B model on a remote RTX 4090 workstation and access it from a MacBook using Tailscale, integrating the setup with OpenCode and Codex.
    Key topics include:
    * Step-by-step llama.cpp build configuration for CUDA support.
    * Using Tailscale to create a secure network between client and GPU machine.
    * Optimizing VRAM usage through specific quantization (UD-Q4_K_XL) and context size management.
    * Implementing a corrected chat template to prevent tool-calling errors in agentic workflows.
    * Performance insights regarding hybrid architectures and KV cache precision.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: rtx 4090

About - Propulsed by SemanticScuttle