klotz: inference* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Investigation into the effect of DDR5 speed on local LLM inference speed.
  2. d-Matrix is transforming the economics of large-scale inference with Corsair, a platform that delivers blazing fast, commercially viable, and sustainable performance for AI inference at scale.
    2025-01-26 Tags: , , , , , , by klotz
  3. The article discusses the importance of fine-tuning machine learning models for optimal inference performance and explores popular tools like vLLM, TensorRT, ONNX Runtime, TorchServe, and DeepSpeed.
  4. Simon Willison explains how to use the mistral.rs library in Rust to run the Llama Vision model on a Mac M2 laptop. He provides a detailed example and discusses the memory usage and GPU utilization.
  5. TabbyAPI is a FastAPI based application that allows for generating text using an LLM (large language model) using the Exllamav2 backend. It supports various model types and features like HuggingFace model downloading, embedding model support, and more.
    2024-09-25 Tags: , , , , by klotz
  6. This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.
    2024-09-14 Tags: , , , , , by klotz
  7. Inference.net offers LLM inference tokens for models like Llama 3.1 at a 50-90% discount from other providers. They aggregate unused compute resources from data centers to offer fast, reliable, and affordable inference services.

    inference.net is a wholesaler of LLM inference tokens for models like Llama 3.1. We provide inference services at a 50-90% discount from what you would pay together.ai or groq.

    'We sell tokens in 10 billion token increments. The current cost per 10 billion tokens for an 8B model is $200."
    2024-08-14 Tags: , , , by klotz
  8. The author explores the use of Gemma 2 and Mozilla's llamafile on AWS Lambda for serverless AI inference
    2024-07-08 Tags: , , , , , , by klotz
  9. Explore the best LLM inference engines and servers available to deploy and serve LLMs in production, including vLLM, TensorRT-LLM, Triton Inference Server, RayLLM with RayServe, and HuggingFace Text Generation Inference.
    2024-06-21 Tags: , , by klotz
  10. Podman AI Lab is the easiest way to work with Large Language Models (LLMs) on your local developer workstation. It provides a catalog of recipes, a curated list of open source models, experiment and compare the models, get ahead of the curve and take your development to new heights wth Podman AI Lab!
    2024-05-11 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: inference + llm

About - Propulsed by SemanticScuttle