SemanticScuttle - klotz.me » Tags: nvidia+llm

Tags: nvidia* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour

This blog post details how to build a natural language Bash agent using NVIDIA Nemotron Nano v2, requiring roughly 200 lines of Python code. It covers the core components, safety considerations, and offers both a from-scratch implementation and a simplified approach using LangGraph.

2025-10-27 Tags: agents, nemotron, bash, llm, nvidia, tutorial, python, langgraph by klotz

Performance of llama.cpp on NVIDIA DGX Spark

This discussion details performance benchmarks of llama.cpp on an NVIDIA DGX Spark, including tests for various models (gpt-oss-20b, gpt-oss-120b, Qwen3, Qwen2.5, Gemma, GLM) with different context depths and batch sizes.

2025-10-15 Tags: llama.cpp, nvidia, dgx spark, llm, gpt, qwen, gemma, glm, ggerganov by klotz

NVIDIA DGX Spark: great hardware, early days for the ecosystem

Simon Willison received a preview unit of the NVIDIA DGX Spark, a desktop "AI supercomputer" retailing around $4,000. He details his experience setting it up and navigating the ecosystem, highlighting both the hardware's impressive specs (ARM64, 128GB RAM, Blackwell GPU) and the initial software challenges.

Key takeaways:

* **Hardware:** The DGX Spark is a compact, powerful machine aimed at AI researchers.
* **Software Hurdles:** Initial setup was complicated by the need for ARM64-compatible software and CUDA configurations, though NVIDIA has significantly improved documentation recently.
* **Tools & Ecosystem:** Claude Code was invaluable for troubleshooting. Ollama, `llama.cpp`, LM Studio, and vLLM are already gaining support for the Spark, indicating a growing ecosystem.
* **Networking:** Tailscale simplifies remote access.
* **Early Verdict:** It's too early to definitively recommend the device, but recent ecosystem improvements are promising.

2025-10-15 Tags: nvidia, dgx spark, simon willison, llm, hardware by klotz

DGX Spark Nvidia’s desktop supercomputer: first look

Nvidia's DGX Spark is a relatively affordable AI workstation that prioritizes capacity over raw speed, enabling it to run models that consumer GPUs cannot. It features 128GB of memory and is based on the Blackwell architecture.

2025-10-14 Tags: llm, nvidia, dgx spark, gpu, hardware, machine learning by klotz

Nvidia's new CPX GPU aims to change the game in AI inference — how the debut of cheaper and cooler GDDR7 memory could redefine AI inference infrastructure

Nvidia introduces the Rubin CPX GPU, designed to accelerate AI inference by decoupling the context and generation phases. It utilizes GDDR7 memory for lower cost and power consumption, aiming to redefine AI infrastructure.

2025-10-05 Tags: nvidia, cpx gpu, inference, gddr7, rubin, hardware, data center, gpu, llm by klotz

Canonical Announces Plans To Support NVIDIA CUDA, Easy Installation On Ubuntu

Canonical announced today that they will formally support the NVIDIA CUDA toolkit and also make it available via the Ubuntu repositories. This aims to simplify CUDA installation and usage on Ubuntu, particularly with the rise of AI development.

2025-09-19 Tags: ubuntu, cuda, nvidia, canonical, llm, linux, development by klotz

Nvidia quietly unveiled its fastest mini PC ever, capable of topping 2070 TFLOPS - and if you squint enough, you might even think it looks like an RTX 5090

Nvidia has expanded its Jetson lineup with the Jetson AGX Thor Developer Kit, a compact platform that carries the new Jetson T5000 system-on-module. Marketed as a developer system, the dimensions and form factor place it firmly in the realm of a mini PC, although its design and purpose align more with edge AI deployment than home computing.

2025-08-31 Tags: nvidia, jetson, mini pc, llm, robotics, blackwell, arm, gpu by klotz

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

This blog post details a fine-tuning workflow for the gpt-oss model that recovers post-training accuracy while retaining the performance benefits of FP4. It involves supervised fine-tuning (SFT) on an upcasted BF16 version of the model, followed by quantization-aware training (QAT) using NVIDIA TensorRT Model Optimizer. The article also discusses the benefits of using NVFP4 for even better convergence and accuracy recovery.

2025-08-30 Tags: gpt-oss, fine-tuning, quantization-aware training, qat, tensorrt model optimizer, mxfp4, nvfp4, bf16, fp4, llm, nvidia by klotz

Retrieval-augmented generation with Nvidia NeMo Retriever

Nvidia’s NeMo Retriever models and RAG pipeline make quick work of ingesting PDFs and generating reports based on them. Chalk one up for the plan-reflect-refine architecture.

2025-08-23 Tags: nvidia, nemo retriever, rag, ai, llms by klotz

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

This article details how to accelerate deep learning and LLM inference using Apache Spark, focusing on distributed inference strategies. It covers basic deployment with `predict_batch_udf`, advanced deployment with inference servers like NVIDIA Triton and vLLM, and deployment on cloud platforms like Databricks and Dataproc. It also provides guidance on resource management and configuration for optimal performance.

2025-05-09 Tags: data science, deep learning, llm, apache spark, nvidia, rapids, triton, vllm, databricks, dataproc, mlops by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: nvidia* + llm*

Linked Tags

Related Tags