llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
A deep dive into model quantization with GGUF and llama.cpp and model evaluation with LlamaIndex
- create a custom base image for a Cloud Workstation environment using a Dockerfile
. Uses:
Quantized models from