This project provides Dockerised deployment of oobabooga's text-generation-webui with pre-built images for Nvidia GPU, AMD GPU, Intel Arc, and CPU-only inference. It supports various extensions and offers easy deployment and updates.
An extension for oobabooga/text-generation-webui that enables the LLM to search the web using DuckDuckGo
An extension that automatically unloads and reloads your model, freeing up VRAM for other programs.
hat - chat directly, character card is your prompt
instruct- chat between "you" and "assistant" using the model's prompt format
chat-instruct- chat with you and a character card as a prompt but with the instruct template applied. .i.e "you are an AI playing x character, respond as the character would" converted to alpaca, wizard or whatever
There is no best, but for factual information, you probably want to keep to instruct mode. instruct-chat doesn't necessarily play the characters better or make them write longer. It's sort of hit or miss. one may work better than the other for a particular model and prompt.
How to get oobabooga/text-generation-webui running on Windows or Linux with LLaMa-30b 4bit mode via GPTQ-for-LLaMa on an RTX 3090 start to finish.