SemanticScuttle - klotz.me

Tags: gpu*

0 bookmark(s) - Sort by: Date ↓ / Title /

This post explores a new idea for parallelizing a simplified parsing task using a "stack monoid" and scan operations, potentially enabling efficient GPU implementation of parsing algorithms.

2025-11-01 Tags: stack monoid, gpu, parsing, parallelization, monoids, scan, functional programming, piet-gpu by klotz

Apple unleashes M5, the next big leap in AI performance for Apple silicon

M5 delivers over 4x the peak GPU compute performance for AI compared to M4, featuring a next-generation GPU with a Neural Accelerator in each core, a more powerful CPU, a faster Neural Engine, and higher unified memory bandwidth.

2025-10-15 Tags: m5, apple, llm, gpu, cpu, neural engine, macbook pro, ipad pro, apple vision pro by klotz

Docker Model Runner on the new NVIDIA DGX Spark: a new paradigm for developing AI locally

This article details the integration of Docker Model Runner with the NVIDIA DGX Spark, enabling faster and simpler local AI model development. It covers setup, usage, and benefits like data privacy, offline availability, and ease of customization.

2025-10-15 Tags: docker, model runner, nvidia, dgx spark, ai, ml, local development, containerization, gpu, cuda by klotz

DGX Spark Nvidia’s desktop supercomputer: first look

Nvidia's DGX Spark is a relatively affordable AI workstation that prioritizes capacity over raw speed, enabling it to run models that consumer GPUs cannot. It features 128GB of memory and is based on the Blackwell architecture.

2025-10-14 Tags: llm, nvidia, dgx spark, gpu, hardware, machine learning by klotz

Nvidia's new CPX GPU aims to change the game in AI inference — how the debut of cheaper and cooler GDDR7 memory could redefine AI inference infrastructure

Nvidia introduces the Rubin CPX GPU, designed to accelerate AI inference by decoupling the context and generation phases. It utilizes GDDR7 memory for lower cost and power consumption, aiming to redefine AI infrastructure.

2025-10-05 Tags: nvidia, cpx gpu, inference, gddr7, rubin, hardware, data center, gpu, llm by klotz

kitty

The fast, feature-rich, GPU based terminal emulator. It's capable, scriptable, composable, cross-platform, and innovative.

2025-09-04 Tags: python, terminal, emulator, kitty, gpu, shell, scriptable, cross-platform, performance by klotz

Nvidia quietly unveiled its fastest mini PC ever, capable of topping 2070 TFLOPS - and if you squint enough, you might even think it looks like an RTX 5090

Nvidia has expanded its Jetson lineup with the Jetson AGX Thor Developer Kit, a compact platform that carries the new Jetson T5000 system-on-module. Marketed as a developer system, the dimensions and form factor place it firmly in the realm of a mini PC, although its design and purpose align more with edge AI deployment than home computing.

2025-08-31 Tags: nvidia, jetson, mini pc, llm, robotics, blackwell, arm, gpu by klotz

My mind was blown: running a 120B parameter AI model on a budget GPU at home

A 120 billion parameter OpenAI model can now run on consumer hardware thanks to the Mixture of Experts (MoE) technique, which significantly reduces memory requirements and allows processing on CPUs while offloading key parts to modest GPUs.

2025-08-21 Tags: llm, mixture of experts, 120b, gpu, cpu, openai, gpt-oss-120b by klotz

Gemma 3: How to Run & Fine-tune

How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth! This page details running Gemma 3 on various platforms, including phones, and fine-tuning it using Unsloth, addressing potential issues with float16 precision and providing optimal configuration settings.

2025-08-16 Tags: gemma 3, llm, fine-tuning, llama.cpp, unsloth, gguf, gpu, colab, vision, audio, oobabooga by klotz

7 things I wish I knew when I started self-hosting LLMs

This article details 7 lessons the author learned while self-hosting Large Language Models (LLMs), covering topics like the importance of memory bandwidth, quantization, electricity costs, hardware choices beyond Nvidia, prompt engineering, Mixture of Experts models, and starting with simpler tools like LM Studio.

2025-07-23 Tags: llm, self-hosting, gpu, quantization, memory bandwidth, ollama, lm studio, mixture of experts by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: gpu*

Linked Tags

Related Tags