0 bookmark(s) - Sort by: Date ↓ / Title /
Mistral.rs is a fast LLM inference platform supporting inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings. It supports the latest Llama and Phi models, as well as X-LoRA and LoRA support. The project aims to provide the fastest LLM inference platform possible.
Quantized models from
First / Previous / Next / Last
/ Page 1 of 0