The Bonsai Demo repository provides a streamlined way to run Bonsai language models locally on various platforms, including macOS via Metal and Linux or Windows via CUDA. It offers support for multiple model sizes—8B, 4B, and 1.7B—in both GGUF and MLX formats, making it highly versatile for different hardware setups. The repository includes automated setup scripts that manage dependencies, Python environments, and model downloads from HuggingFace. Users can perform inference through command-line tools, start a built-in chat server, or even integrate with Open WebUI for a more interactive experience. This project is specifically optimized for efficient, high-performance local execution on Apple Silicon and CUDA-enabled GPUs.
LLMII uses a local LLM to label metadata and index images. It does not rely on a cloud service or database. A visual language model runs on your computer and is used to create captions and keywords for images in a directory tree. The generated information is then added to each image file's metadata.
llama-swap is a lightweight, transparent proxy server that provides automatic model swapping to llama.cpp's server. It allows you to easily switch between different language models on a local server, supporting OpenAI API compatible endpoints and offering features like model grouping, automatic unloading, and a web UI for monitoring.
This pull request adds StreamingLLM support for llamacpp and llamacpp_HF models, aiming to improve performance and reliability. The changes allow indefinite chatting with the model without re-evaluating the prompt.
This is a GitHub repository for a Discord bot named discord-llm-chatbot. This bot allows you to chat with Large Language Models (LLMs) directly in your Discord server. It supports various LLMs, including those from OpenAI API, Mistral API, Anthropic API, and local models like ollama, oobabooga, Jan, LM Studio, etc. The bot offers a reply-based chat system, customizable system prompt, and seamless threading of conversations. It also supports image and text file attachments, and streamed responses.
This article discusses how to test small language models using 3.8B Phi-3 and 8B Llama-3 models on a PC and Raspberry Pi with LlamaCpp and ONNX. Written by Dmitrii Eliuseev.