Tags: llms* + gpu*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. M5 delivers over 4x the peak GPU compute performance for AI compared to M4, featuring a next-generation GPU with a Neural Accelerator in each core, a more powerful CPU, a faster Neural Engine, and higher unified memory bandwidth.
  2. Nvidia's DGX Spark is a relatively affordable AI workstation that prioritizes capacity over raw speed, enabling it to run models that consumer GPUs cannot. It features 128GB of memory and is based on the Blackwell architecture.
  3. Nvidia introduces the Rubin CPX GPU, designed to accelerate AI inference by decoupling the context and generation phases. It utilizes GDDR7 memory for lower cost and power consumption, aiming to redefine AI infrastructure.
  4. Nvidia has expanded its Jetson lineup with the Jetson AGX Thor Developer Kit, a compact platform that carries the new Jetson T5000 system-on-module. Marketed as a developer system, the dimensions and form factor place it firmly in the realm of a mini PC, although its design and purpose align more with edge AI deployment than home computing.
    2025-08-31 Tags: , , , , , , , by klotz
  5. A 120 billion parameter OpenAI model can now run on consumer hardware thanks to the Mixture of Experts (MoE) technique, which significantly reduces memory requirements and allows processing on CPUs while offloading key parts to modest GPUs.
  6. How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth! This page details running Gemma 3 on various platforms, including phones, and fine-tuning it using Unsloth, addressing potential issues with float16 precision and providing optimal configuration settings.
  7. This article details 7 lessons the author learned while self-hosting Large Language Models (LLMs), covering topics like the importance of memory bandwidth, quantization, electricity costs, hardware choices beyond Nvidia, prompt engineering, Mixture of Experts models, and starting with simpler tools like LM Studio.
  8. This tutorial introduces the essential topics of the PyTorch deep learning library in about one hour. It covers tensors, training neural networks, and training models on multiple GPUs.
  9. A web application for parsing GGUF files.
    2025-04-28 Tags: , , , , by klotz
  10. Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "llms+gpu"

About - Propulsed by SemanticScuttle