Tags: gpu* + nvidia*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. This article details the journey of deploying an on-premise Large Language Model (LLM) server, focusing on security considerations. It explores the rationale behind on-premise deployment for privacy and data control, outlining the goals of creating an air-gapped, isolated infrastructure. The authors delve into the hardware selection process, choosing components like an Nvidia RTX Pro 6000 Max-Q for its memory capacity. The deployment process starts with a minimal setup using llama.cpp, then progresses to containerization with Podman and the use of CDI for GPU access. Finally, the article discusses hardening techniques, including kernel module management and file permission restrictions, to minimize the attack surface and enhance security.
  2. >The method, called KV Cache Transform Coding (KVTC), applies ideas from media compression formats like JPEG to shrink the key-value cache behind multi-turn AI systems, lowering GPU memory demands and speeding up time-to-first-token by up to 8x.
  3. The RTX 3090 offers a compelling combination of performance and 24GB of VRAM, making it a better choice for local LLM and AI workloads than newer Nvidia Blackwell GPUs like the RTX 5070 and even the RTX 5080, due to VRAM limitations and pricing.
    2026-02-07 Tags: , , , , , , , , , by klotz
  4. CUDA Tile is a new Python package that simplifies GPU programming by automatically tiling loops, handling data transfer, and optimizing memory access. It allows developers to write concise and readable code that leverages the full power of NVIDIA GPUs without needing to manually manage the complexities of parallel programming.
  5. A new patch enables Nvidia GPU support on Raspberry Pi 5 and Rockchip devices, allowing for GPU-accelerated compute tasks. The article details the setup process, performance testing with llama.cpp, and current limitations with display output.
  6. This article details the integration of Docker Model Runner with the NVIDIA DGX Spark, enabling faster and simpler local AI model development. It covers setup, usage, and benefits like data privacy, offline availability, and ease of customization.
  7. Nvidia's DGX Spark is a relatively affordable AI workstation that prioritizes capacity over raw speed, enabling it to run models that consumer GPUs cannot. It features 128GB of memory and is based on the Blackwell architecture.
  8. Nvidia introduces the Rubin CPX GPU, designed to accelerate AI inference by decoupling the context and generation phases. It utilizes GDDR7 memory for lower cost and power consumption, aiming to redefine AI infrastructure.
  9. Nvidia has expanded its Jetson lineup with the Jetson AGX Thor Developer Kit, a compact platform that carries the new Jetson T5000 system-on-module. Marketed as a developer system, the dimensions and form factor place it firmly in the realm of a mini PC, although its design and purpose align more with edge AI deployment than home computing.
    2025-08-31 Tags: , , , , , , , by klotz
  10. Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "gpu+nvidia"

About - Propulsed by SemanticScuttle