This collection, curated by prism-ml, features a specialized series of 1-bit Bonsai models designed for efficient text generation. The repository includes various model architectures and sizes, such as the 8B, 4B, and 1.7B parameter versions, optimized through extreme quantization. Available in formats like GGUF and MLX-1bit, these models are highly compressed to maximize performance while minimizing the computational footprint. This makes them ideal for running large language model tasks on hardware with limited resources. The collection serves as a hub for exploring the potential of ultra-compact, highly compressed models in the evolving landscape of machine learning and efficient inference.
The Bonsai Demo repository provides a streamlined way to run Bonsai language models locally on various platforms, including macOS via Metal and Linux or Windows via CUDA. It offers support for multiple model sizes—8B, 4B, and 1.7B—in both GGUF and MLX formats, making it highly versatile for different hardware setups. The repository includes automated setup scripts that manage dependencies, Python environments, and model downloads from HuggingFace. Users can perform inference through command-line tools, start a built-in chat server, or even integrate with Open WebUI for a more interactive experience. This project is specifically optimized for efficient, high-performance local execution on Apple Silicon and CUDA-enabled GPUs.
Alibaba’s Qwen team released the Qwen 3 model family, offering a range of sizes and capabilities. The article discusses the model's features, performance, and the well-coordinated release across the LLM ecosystem, highlighting the trend of better models running on the same hardware.
Transformer Lab is an open-source application for advanced LLM engineering, allowing users to interact, train, fine-tune, and evaluate large language models on their own computer. It supports various models, hardware, and inference engines and includes features like RAG, dataset building, and a REST API.
Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi, pretty much any device!
SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.
MLX-VLM: A package for running Vision LLMs on Mac using MLX.