SemanticScuttle - klotz.me » Tags: pytorch+llms

Tags: pytorch* + llms*

0 bookmark(s) - Sort by: Date ↓ / Title /

The article details “autoresearch,” a project by Karpathy where an AI agent autonomously experiments with training a small language model (nanochat) to improve its performance. The agent modifies the `train.py` file, trains for a fixed 5-minute period, and evaluates the results, repeating this process to iteratively refine the model. The project aims to demonstrate autonomous AI research, focusing on a simplified, single-GPU setup with a clear metric (validation bits per byte).

* **Autonomous Research:** The core concept of AI-driven experimentation.
* **nanochat:** The small language model used for training.
* **Fixed Time Budget:** Each experiment runs for exactly 5 minutes.
* **program.md:** The file containing instructions for the AI agent.
* **Single-File Modification:** The agent only edits `train.py`.

2026-03-09 Tags: ai, autoresearch, llm, training, nanochat, autonomous agents, machine learning, python, pytorch, andrej karpathy by klotz

Sparse Transition Matrix-Accelerated Trie Index for Constrained Decoding

This repository provides the official implementation of the STATIC (Sparse Transition-Accelerated Trie Index for Constrained decoding) framework, as described in Su et al., 2026. STATIC is a high-performance method for enforcing outputs to stay within a prespecified set during autoregressive decoding from large language models, designed for maximum efficiency on modern hardware accelerators like GPUs and TPUs.

2026-03-02 Tags: constrained decoding, large language models, sparse trie, accelerator, jax, pytorch, inference, beam search, github, youtube, google by klotz

GokuMohandas/Made-With-ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.

2026-01-02 Tags: python, data-science, machine-learning, natural-language-processing, deep-learning, pytorch, data-engineering, ray, data-quality, distributed-training, mlops, distributed-ml, llm, automation by klotz

Meet oLLM: A Lightweight Python Library that brings 100K-Context LLM Inference to 8 GB Consumer GPUs via SSD Offload—No Quantization Required

oLLM is a Python library for running large-context Transformers on NVIDIA GPUs by offloading weights and KV-cache to SSDs. It supports models like Llama-3, GPT-OSS-20B, and Qwen3-Next-80B, enabling up to 100K tokens of context on 8-10 GB GPUs without quantization.

2025-09-30 Tags: ollm, llm, inference, python, huggingface, pytorch, llama-3, gpt-oss, qwen3-next by klotz

PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs

This tutorial introduces the essential topics of the PyTorch deep learning library in about one hour. It covers tensors, training neural networks, and training models on multiple GPUs.

2025-07-05 Tags: pytorch, deep learning, tensors, neural networks, gpu, automatic differentiation, machine learning, llm by klotz

exo: Run your own AI cluster at home with everyday devices

Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi, pretty much any device!

2025-02-28 Tags: llm, cluster, gpu, mlx, tinygrad, pytorch, llama.cpp, distributed systems by klotz

Nvidia's CUDA moat may not be as impenetrable as you think • The Register

The article discusses the competition Nvidia faces from Intel and AMD in the GPU market. While these competitors have introduced new accelerators that match or surpass Nvidia's offerings in terms of memory capacity, performance, and price, Nvidia maintains a strong advantage through its CUDA software ecosystem. CUDA has been a significant barrier for developers switching to alternative hardware due to the effort required to port and optimize existing code. However, both Intel and AMD have developed tools to ease this transition, like AMD's HIPIFY and Intel's SYCL. Despite these efforts, the article notes that the majority of developers now write higher-level code using frameworks like PyTorch, which can run on different hardware with varying levels of support and performance. This shift towards higher-level programming languages has reduced the impact of Nvidia's CUDA moat, though challenges still exist in ensuring compatibility and performance across different hardware platforms.

2024-12-25 Tags: nvidia, cuda, the register, gpu, llm, pytorch by klotz

Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs

2024-04-19 Tags: llm, attention, python, pytorch, self-attention by klotz

Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem | PyTorch

efficient method for fine-tuning LLM using LoRA and QLoRA, making it possible to train them even on consumer hardware

2024-01-12 Tags: llm, fine tuning, qlora, lora, peft, pytorch, hugging face, fine-tuning, llms by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: pytorch* + llms*

Linked Tags

Related Tags