Mistral.rs is a fast LLM inference platform supporting inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings. It supports the latest Llama and Phi models, as well as X-LoRA and LoRA support. The project aims to provide the fastest LLM inference platform possible.
It all started as a joke. I was in a group chat with a few of my friends and we were talking about football (soccer for the American readers). I entered the chat during a mildly heated discussion about the manager of a team one of my friends supports. It was going on for a bit while with seemingly no end in sight...