klotz: vllm*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This guide delves into three prominent projects for serving large language models and vision-language models: VLLM, LLAMA CPP Server, and SGLang. Each project offers distinct functionalities and is explained with usage instructions, features, and deployment methods.
    2024-09-30 Tags: , , by klotz
  2. This repository contains scripts for benchmarking the performance of large language models (LLMs) served using vLLM.
    2024-08-24 Tags: , , , , by klotz
  3. A startup called Backprop has demonstrated that a single Nvidia RTX 3090 GPU, released in 2020, can handle serving a modest large language model (LLM) like Llama 3.1 8B to over 100 concurrent users with acceptable throughput. This suggests that expensive enterprise GPUs may not be necessary for scaling LLMs to a few thousand users.
  4. High-performance deployment of the vLLM serving engine, optimized for serving large language models at scale.
  5. This blog post benchmarks and compares the performance of SGLang, TensorRT-LLM, and vLLM for serving large language models (LLMs). SGLang demonstrates superior or competitive performance in offline and online scenarios, often outperforming vLLM and matching or exceeding TensorRT-LLM.
    2024-07-27 Tags: , , , , by klotz
  6. This page provides information about LLooM, a tool that uses raw LLM logits to weave threads in a probabilistic way. It includes instructions on how to use LLooM with various environments, such as vLLM, llama.cpp, and OpenAI. The README also explains the parameters and configurations for LLooM.
  7. 2024-01-10 Tags: , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: vllm

About - Propulsed by SemanticScuttle