A Vim plugin that provides local LLM-assisted code and text completion using llama.cpp server instances. It supports features like auto-suggest on cursor movement, manual toggle with Ctrl+F, accepting suggestions with Tab/Shift+Tab, configurable context scope, and performance stats display.
llama-swap is a lightweight, transparent proxy server that provides automatic model swapping to llama.cpp's server. It allows you to easily switch between different language models on a local server, supporting OpenAI API compatible endpoints and offering features like model grouping, automatic unloading, and a web UI for monitoring.
This blog post details the training of 'Chess Llama', a small Llama model designed to play chess. It covers the inspiration behind the project (Chess GPT), the dataset used (Lichess Elite database), the training process using Huggingface Transformers, and the model's performance (Elo rating of 1350-1400). It also includes links to try the model and view the source code.
A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.
Transformer Lab is an open-source application for advanced LLM engineering, allowing users to interact, train, fine-tune, and evaluate large language models on their own computer. It supports various models, hardware, and inference engines and includes features like RAG, dataset building, and a REST API.
An analysis of how well different AI systems perform in describing images and answering questions about them. The article compares ChatGPT, Gemini, Llama, and Claude using four images: a hand, a bottle of wine, a piece of pastry, and a flower.
A script utilizing OpenAI's Llama models to interact within a terminal environment, allowing the models to execute Python code and communicate based on predefined prompts.
A guided series of tutorials/notebooks to build a PDF to Podcast workflow using Llama models for text processing, transcript writing, dramatization, and text-to-speech conversion.
Simon Willison explains how to use the mistral.rs library in Rust to run the Llama Vision model on a Mac M2 laptop. He provides a detailed example and discusses the memory usage and GPU utilization.
This article compares the performance of smaller language models Gemma, Llama 3, and Mistral on reading comprehension tasks. The author highlights the trend of smaller, more accessible models and discusses Apple's recent foray into the field with its own proprietary model.