0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag
High-performance deployment of the vLLM serving engine, optimized for serving large language models at scale.
This page provides information about LLooM, a tool that uses raw LLM logits to weave threads in a probabilistic way. It includes instructions on how to use LLooM with various environments, such as vLLM, llama.cpp, and OpenAI. The README also explains the parameters and configurations for LLooM.
First / Previous / Next / Last
/ Page 1 of 0