0 bookmark(s) - Sort by: Date ↓ / Title /
The article discusses the importance of fine-tuning machine learning models for optimal inference performance and explores popular tools like vLLM, TensorRT, ONNX Runtime, TorchServe, and DeepSpeed.
This article discusses how to test small language models using 3.8B Phi-3 and 8B Llama-3 models on a PC and Raspberry Pi with LlamaCpp and ONNX. Written by Dmitrii Eliuseev.
First / Previous / Next / Last
/ Page 1 of 0