SemanticScuttle - klotz.me » Tags: gpu+machine learning

Tags: gpu* + machine learning*

0 bookmark(s) - Sort by: Date ↓ / Title /

AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs

AMD now supports Google’s Gemma 4 models (2B–31B parameters) across its entire hardware lineup, including Instinct GPUs (datacenters), Radeon GPUs (workstations), and Ryzen AI processors (PCs). The integration is compatible with vLLM, SGLang, llama.cpp, Ollama, and Lemonade Server, aiming to optimize AI performance for both cloud and local deployment.

2026-04-05 Tags: amd, google, gemma 4, gpu, cpu, radeon, ryzen, models, vllm, sglang, llama.cpp, machine learning, hardware support, llm, hardware by klotz

Google AI Introduces STATIC: A Sparse Matrix Framework Delivering 948x Faster Constrained Decoding for LLM Based Generative Retrieval

Google AI introduces STATIC, a sparse matrix framework that accelerates constrained decoding for LLM-based generative retrieval. It addresses the inefficiency of traditional trie implementations on hardware accelerators by flattening the trie into a static Compressed Sparse Row (CSR) matrix, achieving up to 948x speedup and demonstrating improvements in YouTube video recommendations.

2026-03-02 Tags: large language model, llm, generative retrieval, constrained decoding, static, sparse matrix, trie, tpu, gpu, google ai, recommendation systems, semantic ids, machine learning by klotz

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

This blog post details how to implement high-performance matrix multiplication using NVIDIA cuTile, focusing on Tile loading, computation, storage, and block-level parallel programming. It also covers best practices for Tile programming and performance optimization strategies.

2026-01-17 Tags: cuda, cutile, matrix multiplication, gpu, performance optimization, tile programming, deep learning, parallel programming by klotz

DGX Spark Nvidia’s desktop supercomputer: first look

Nvidia's DGX Spark is a relatively affordable AI workstation that prioritizes capacity over raw speed, enabling it to run models that consumer GPUs cannot. It features 128GB of memory and is based on the Blackwell architecture.

2025-10-14 Tags: llm, nvidia, dgx spark, gpu, hardware, machine learning by klotz

PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs

This tutorial introduces the essential topics of the PyTorch deep learning library in about one hour. It covers tensors, training neural networks, and training models on multiple GPUs.

2025-07-05 Tags: pytorch, deep learning, tensors, neural networks, gpu, automatic differentiation, machine learning, llm by klotz

El Reg's essential guide to deploying LLMs in production

Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.

2025-04-28 Tags: llm, ai, production engineering, inference engineering, deployment, vllm, nvidia, kubernetes, inference, api, scaling, gpu, machine learning by klotz

NVIDIA DGX Spark

NVIDIA DGX Spark is a desktop-friendly AI supercomputer powered by the NVIDIA GB10 Grace Blackwell Superchip, delivering 1000 AI TOPS of performance with 128GB of memory. It is designed for prototyping, fine-tuning, and inference of large AI models.