SemanticScuttle - klotz.me » klotz: rtx 4090

Using a local LLM in OpenCode with llama.cpp

A comprehensive technical guide on setting up a high-performance local large language model environment for agentic coding tasks. The author demonstrates how to run a quantized Qwen3.5-27B model on a remote RTX 4090 workstation and access it from a MacBook using Tailscale, integrating the setup with OpenCode and Codex.
Key topics include:
* Step-by-step llama.cpp build configuration for CUDA support.
* Using Tailscale to create a secure network between client and GPU machine.
* Optimizing VRAM usage through specific quantization (UD-Q4_K_XL) and context size management.
* Implementing a corrected chat template to prevent tool-calling errors in agentic workflows.
* Performance insights regarding hybrid architectures and KV cache precision.

2026-04-11 Tags: llama.cpp, opencode, qwen3.5, local llm, rtx 4090, tailscale, coding assistant, gguf by klotz

SemanticScuttle - klotz.me

klotz: rtx 4090*

Linked Tags

Related Tags