PygmalionAI's large-scale inference engine designed for serving Pygmalion models to a large number of users with blazing fast speeds. Integrates work from projects like vLLM, TensorRT-LLM, xFormers, AutoAWQ, AutoGPTQ, SqueezeLLM, Exllamav2, TabbyAPI, AQLM, KoboldAI, Text Generation WebUI, and Megatron-LM.