This pull request adds StreamingLLM support for llamacpp and llamacpp_HF models, aiming to improve performance and reliability. The changes allow indefinite chatting with the model without re-evaluating the prompt.
This PR implements the StreamingLLM technique for model loaders, focusing on handling context length and optimizing chat generation speed.
This project provides Dockerised deployment of oobabooga's text-generation-webui with pre-built images for Nvidia GPU, AMD GPU, Intel Arc, and CPU-only inference. It supports various extensions and offers easy deployment and updates.
A benchmark of large language models, sorted by size (on disk) for each score. Highlighted entries are on the Pareto frontier.
A web search extension for Oobabooga's text-generation-webui (now with nougat) that allows for web search integration with the AI.
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectors, now in oobabooga text generation webui!
An extension for oobabooga/text-generation-webui that enables the LLM to search the web using DuckDuckGo
An extension that automatically unloads and reloads your model, freeing up VRAM for other programs.
This is a GitHub repository for a Discord bot named discord-llm-chatbot. This bot allows you to chat with Large Language Models (LLMs) directly in your Discord server. It supports various LLMs, including those from OpenAI API, Mistral API, Anthropic API, and local models like ollama, oobabooga, Jan, LM Studio, etc. The bot offers a reply-based chat system, customizable system prompt, and seamless threading of conversations. It also supports image and text file attachments, and streamed responses.