This collection, curated by prism-ml, features a specialized series of 1-bit Bonsai models designed for efficient text generation. The repository includes various model architectures and sizes, such as the 8B, 4B, and 1.7B parameter versions, optimized through extreme quantization. Available in formats like GGUF and MLX-1bit, these models are highly compressed to maximize performance while minimizing the computational footprint. This makes them ideal for running large language model tasks on hardware with limited resources. The collection serves as a hub for exploring the potential of ultra-compact, highly compressed models in the evolving landscape of machine learning and efficient inference.
The Bonsai Demo repository provides a streamlined way to run Bonsai language models locally on various platforms, including macOS via Metal and Linux or Windows via CUDA. It offers support for multiple model sizes—8B, 4B, and 1.7B—in both GGUF and MLX formats, making it highly versatile for different hardware setups. The repository includes automated setup scripts that manage dependencies, Python environments, and model downloads from HuggingFace. Users can perform inference through command-line tools, start a built-in chat server, or even integrate with Open WebUI for a more interactive experience. This project is specifically optimized for efficient, high-performance local execution on Apple Silicon and CUDA-enabled GPUs.