SemanticScuttle - klotz.me » klotz: huggingface+llm

klotz: huggingface* + llm*

Meet oLLM: A Lightweight Python Library that brings 100K-Context LLM Inference to 8 GB Consumer GPUs via SSD Offload—No Quantization Required

oLLM is a Python library for running large-context Transformers on NVIDIA GPUs by offloading weights and KV-cache to SSDs. It supports models like Llama-3, GPT-OSS-20B, and Qwen3-Next-80B, enabling up to 100K tokens of context on 8-10 GB GPUs without quantization.

2025-09-30 Tags: ollm, llm, inference, python, huggingface, pytorch, llama-3, gpt-oss, qwen3-next by klotz

Chess Llama - Training a tiny Llama model to play chess

This blog post details the training of 'Chess Llama', a small Llama model designed to play chess. It covers the inspiration behind the project (Chess GPT), the dataset used (Lichess Elite database), the training process using Huggingface Transformers, and the model's performance (Elo rating of 1350-1400). It also includes links to try the model and view the source code.

2025-07-21 Tags: chess, llama, llm, machine learning, artificial intelligence, deep learning, transformers, huggingface, chessgpt, uci, pgn by klotz

DeepSeek-R1-0528-Qwen3-8B-GGUF

This page details the DeepSeek-R1-0528-Qwen3-8B model, a quantized version of DeepSeek-R1-0528, highlighting its improved reasoning capabilities, evaluation results, usage guidelines, and licensing information. It offers various quantization options (GGUF) for local execution.

2025-05-30 Tags: deepseek-r1, qwen3, gguf, llm, quantization, reasoning, text generation, transformers, model card, mcp, huggingface by klotz

gguf-parser-web

A web application for parsing GGUF files.

2025-04-28 Tags: gguf, parser, huggingface, llm, gpu by klotz

Docker Model Runner Brings Local LLMs to Your Desktop

Docker is making it easier for developers to run and test AI Large Language Models (LLMs) on their PCs with the launch of Docker Model Runner, a new beta feature in Docker Desktop 4.40 for Apple silicon-powered Macs. It also integrates the Model Context Protocol (MCP) for streamlined connections between AI agents and data sources.

2025-04-24 Tags: docker, llm, containers, model runner, mcp, localllama, llama.cpp, docker desktop, huggingface, oobabooga, ollama by klotz

Training Large Language Models with Interpreter Feedback using WebAssembly

This article details a method for training large language models (LLMs) for code generation using a secure, local WebAssembly-based code interpreter and reinforcement learning with Group Relative Policy Optimization (GRPO). It covers the setup, training process, evaluation, and potential next steps.

2025-04-04 Tags: huggingface, llm, training, code generation, webassembly, wasm, grpo, reinforcement learning, axolotl, code interpreter, fine-tuning, python by klotz

Ultrascale Playbook

A comprehensive guide to ultrascale machine learning, covering techniques, tools, and best practices.

2025-03-13 Tags: scale, machine learning, huggingface, production engineering, llm by klotz

Qodo-Embed-1-1.5B

Qodo-Embed-1-1.5B is a state-of-the-art code embedding model designed for retrieval tasks in the software development domain. It supports multiple programming languages and is optimized for natural language-to-code and code-to-code retrieval, making it highly effective for applications such as code search and retrieval-augmented generation.

2025-03-04 Tags: qodo-embed-1, code, embedding, llm, software development, huggingface by klotz

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Sergey Pletenev et al. explore the integration of new knowledge into Large Language Models (LLMs) using Low-Rank Adaptation (LoRA). The study focuses on fine-tuning the Llama-3.1-8B-instruct model with varying amounts of new information while aiming to retain previously learned knowledge. The researchers found that mixing known and new facts in training data yields the best results but also noted potential drawbacks, such as a decline in performance on external benchmarks and a bias towards overrepresented answers when the data is skewed. Additionally, the model sometimes becomes overly confident and hesitant to answer. These findings emphasize the need for careful consideration of training data composition and tuning parameters to balance the incorporation of new knowledge with maintaining overall model capabilities.

2025-02-22 Tags: large language models, lora, knowledge, question-answering benchmarks, overfitting, llm, huggingface by klotz

Qwen2.5-VL Technical Report

Qwen2.5-VL is a flagship model of the Qwen vision-language series, showcasing advancements in visual recognition, object localization, document parsing, and long-video comprehension. It introduces dynamic resolution processing and absolute time encoding, allowing it to handle complex inputs and maintain native resolution. Available in three sizes, it suits various applications from edge AI to high-performance computing, matching state-of-the-art models in document and diagram understanding while preserving strong linguistic capabilities.

2025-02-21 Tags: qwen2.5-vl, vision-language model, llm, huggingface, qwen, alibaba by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: huggingface* + llm*

Linked Tags

Related Tags