This article details how to run Large Language Models (LLMs) on Intel GPUs using the llama.cpp framework and its new SYCL backend, offering performance improvements and broader hardware support.
Orange Pi has announced the Orange Pi AI Station, a compact edge computing platform featuring the Ascend 310 processor, offering up to 176 TOPS of AI compute performance with options for up to 96GB of LPDDR4X memory and NVMe storage.
A deep dive into the process of LLM inference, covering tokenization, transformer architecture, KV caching, and optimization techniques for efficient text generation.
A visual introduction to probability and statistics, covering basic probability, compound probability, probability distributions, frequentist inference, Bayesian inference, and regression analysis. Created by Daniel Kunin and team with interactive visualizations using D3.js.
This tutorial guides you through installing and using an inference snap, specifically Qwen 2.5 VL, a multi-modal large language model. It covers installation, status checks, basic chat, and configuring Open WebUI for image-based prompts.
Canonical today announced optimized inference snaps, a new way to deploy AI models on Ubuntu devices, with automatic selection of optimized engines, quantizations and architectures based on the specific silicon of the device.
On October 23rd, we announced the beta availability of silicon-optimized AI models in Ubuntu. Developers can locally install DeepSeek R1 and Qwen 2.5 VL with a single command, benefiting from maximized hardware performance and automated dependency management.
This article details the performance of Unsloth Dynamic GGUFs on the Aider Polyglot benchmark, showcasing how it can quantize LLMs like DeepSeek-V3.1 to as low as 1-bit while outperforming models like GPT-4.5 and Claude-4-Opus. It also covers benchmark setup, comparisons to other quantization methods, and chat template bug fixes.
Nvidia introduces the Rubin CPX GPU, designed to accelerate AI inference by decoupling the context and generation phases. It utilizes GDDR7 memory for lower cost and power consumption, aiming to redefine AI infrastructure.
A detailed guide for running the new gpt-oss models locally with the best performance using `llama.cpp`. The guide covers a wide range of hardware configurations and provides CLI argument explanations and benchmarks for Apple Silicon devices.