SemanticScuttle - klotz.me

klotz: mlx*

This collection, curated by prism-ml, features a specialized series of 1-bit Bonsai models designed for efficient text generation. The repository includes various model architectures and sizes, such as the 8B, 4B, and 1.7B parameter versions, optimized through extreme quantization. Available in formats like GGUF and MLX-1bit, these models are highly compressed to maximize performance while minimizing the computational footprint. This makes them ideal for running large language model tasks on hardware with limited resources. The collection serves as a hub for exploring the potential of ultra-compact, highly compressed models in the evolving landscape of machine learning and efficient inference.

2026-04-05 Tags: bonsai, 1-bit models, prism-ml, text generation, gguf, mlx, quantization, llm, machine learning by klotz

Bonsai: 1-bit LLM

The Bonsai Demo repository provides a streamlined way to run Bonsai language models locally on various platforms, including macOS via Metal and Linux or Windows via CUDA. It offers support for multiple model sizes—8B, 4B, and 1.7B—in both GGUF and MLX formats, making it highly versatile for different hardware setups. The repository includes automated setup scripts that manage dependencies, Python environments, and model downloads from HuggingFace. Users can perform inference through command-line tools, start a built-in chat server, or even integrate with Open WebUI for a more interactive experience. This project is specifically optimized for efficient, high-performance local execution on Apple Silicon and CUDA-enabled GPUs.

2026-04-05 Tags: bonsai, mlx, llm, small-models, llamacpp, prism-ml, local inference by klotz

Qwen 3 offers a case study in how to effectively release a model

Alibaba’s Qwen team released the Qwen 3 model family, offering a range of sizes and capabilities. The article discusses the model's features, performance, and the well-coordinated release across the LLM ecosystem, highlighting the trend of better models running on the same hardware.

2025-04-29 Tags: llm, qwen, mlx, ollama, reasoning, qwen 3, alibaba, simon willison by klotz

Transformer Lab: Experiment with Large Language Models

Transformer Lab is an open-source application for advanced LLM engineering, allowing users to interact, train, fine-tune, and evaluate large language models on their own computer. It supports various models, hardware, and inference engines and includes features like RAG, dataset building, and a REST API.

2025-04-11 Tags: electron, transformers, llama, lora, mlx, llms, rlhf, llm, github by klotz

exo: Run your own AI cluster at home with everyday devices

Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi, pretty much any device!

2025-02-28 Tags: llm, cluster, gpu, mlx, tinygrad, pytorch, llama.cpp, distributed systems by klotz

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.

2025-02-21 Tags: smolvlm2, video understanding, python, machine learning, video, transformers, mlx, vlm, llm by klotz

mlx-vlm

MLX-VLM: A package for running Vision LLMs on Mac using MLX.

2024-09-10 Tags: mlx-vlm, vision, llm, mlx, python, machine vision by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: mlx*

Linked Tags

Related Tags