Tags: inference* + performance*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. A detailed guide for running the new gpt-oss models locally with the best performance using `llama.cpp`. The guide covers a wide range of hardware configurations and provides CLI argument explanations and benchmarks for Apple Silicon devices.
  2. LocalScore is an open benchmark to evaluate local AI task performance across various hardware configurations, measuring Prompt Processing speed, Token Generation speed, Time-to-First-Token (TTFT), and a combined LocalScore.
  3. Investigation into the effect of DDR5 speed on local LLM inference speed.
  4. The article discusses the importance of fine-tuning machine learning models for optimal inference performance and explores popular tools like vLLM, TensorRT, ONNX Runtime, TorchServe, and DeepSpeed.
  5. 2023-11-18 Tags: , , , , by klotz
  6. 2023-10-13 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "inference+performance"

About - Propulsed by SemanticScuttle