0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag
This guide delves into three prominent projects for serving large language models and vision-language models: VLLM, LLAMA CPP Server, and SGLang. Each project offers distinct functionalities and is explained with usage instructions, features, and deployment methods.
ls ./models 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model
python3 -m pip install -r requirements.txt
python3 convert.py models/7B/
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
./main -m ./models/7B/ggml-model-q4_0.bin -n 128
First / Previous / Next / Last
/ Page 1 of 0