SemanticScuttle - klotz.me » Tags: local llm+llama.cpp

Tags: local llm* + llama.cpp*

0 bookmark(s) - Sort by: Date ↓ / Title /

I finally found an open-source local LLM that actually competes with cloud AI

The author explores the utility of Google DeepMind's Gemma 4 as a powerful option for running large language models locally on consumer hardware. By testing the E4B variant using tools like LM Studio and llama.cpp, they demonstrate how open-weight models can handle multimodal tasks including text, image analysis, and audio processing with impressive precision and privacy.

2026-05-12 Tags: gemma 4, google deepmind, local llm, multimodal, llama.cpp by klotz

Using a local LLM in OpenCode with llama.cpp

A comprehensive technical guide on setting up a high-performance local large language model environment for agentic coding tasks. The author demonstrates how to run a quantized Qwen3.5-27B model on a remote RTX 4090 workstation and access it from a MacBook using Tailscale, integrating the setup with OpenCode and Codex.
Key topics include:
* Step-by-step llama.cpp build configuration for CUDA support.
* Using Tailscale to create a secure network between client and GPU machine.
* Optimizing VRAM usage through specific quantization (UD-Q4_K_XL) and context size management.
* Implementing a corrected chat template to prevent tool-calling errors in agentic workflows.
* Performance insights regarding hybrid architectures and KV cache precision.

2026-04-11 Tags: llama.cpp, opencode, qwen3.5, local llm, rtx 4090, tailscale, coding assistant, gguf by klotz

New in llama.cpp: Anthropic Messages API

The llama.cpp server has introduced support for the Anthropic Messages API, a highly requested feature that allows users to run Claude-compatible clients with locally hosted models. This implementation enables powerful tools like Claude Code to interface directly with local GGUF models by internally converting Anthropic's message format to OpenAI's standard. Key features of this update include full support for chat completions with streaming, advanced tool use through function calling, token counting capabilities, vision support for multimodal models, and extended thinking for reasoning models. This development bridges the gap between proprietary AI ecosystems and local, privacy-focused inference pipelines, providing a seamless experience for developers working with agentic workloads and coding assistants.

ANTHROPIC_AUTH_TOKEN, ANTHROPIC_MODEL=

2026-04-11 Tags: llama.cpp, anthropic, api, claude code, local llm, gguf, tool use, function calling, llm by klotz

I'm running a 120B local LLM on 24GB of VRAM, and now it powers my smart home

This article details how to run a 120B parameter LLM locally with 24GB of VRAM and 64GB of system RAM, using a setup with Proxmox LXCs, Whisper for voice transcription, and integration with Home Assistant for smart home automation.

2025-12-29 Tags: llm, local llm, smart home, proxmox, whisper, home assistant, llama.cpp, gpt-oss by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: local llm* + llama.cpp*

Linked Tags

Related Tags