Tags: llm* + inference*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. This Space demonstrates a simple method for embedding text using a LLM (Large Language Model) via the Hugging Face Inference API. It showcases how to convert text into numerical vector representations, useful for semantic search and similarity comparisons.

  2. NVIDIA DGX Spark is a desktop-friendly AI supercomputer powered by the NVIDIA GB10 Grace Blackwell Superchip, delivering 1000 AI TOPS of performance with 128GB of memory. It is designed for prototyping, fine-tuning, and inference of large AI models.

  3. Alibaba's Qwen team aims to find out with its latest release, QwQ. Despite having a fraction of DeepSeek R1's claimed 671 billion parameters, Alibaba touts its comparatively compact 32-billion 'reasoning' model as outperforming R1 in select math, coding, and function-calling benchmarks.

    2025-03-17 Tags: , , , , , , by klotz
  4. The NVIDIA Jetson Orin Nano Super is highlighted as a compact, powerful computing solution for edge AI applications. It enables sophisticated AI capabilities at the edge, supporting large-scale inference tasks with the help of high-capacity storage solutions like the Solidigm 122.88TB SSD. This review explores its use in various applications including wildlife conservation, surveillance, and AI model distribution, emphasizing its potential in real-world deployments.

  5. The Cerebras API offers low-latency AI model inference using Cerebras Wafer-Scale Engines and CS-3 systems, providing access to Meta's Llama models for conversational applications.

    2025-02-08 Tags: , , , , , by klotz
  6. Investigation into the effect of DDR5 speed on local LLM inference speed.

  7. d-Matrix is transforming the economics of large-scale inference with Corsair, a platform that delivers blazing fast, commercially viable, and sustainable performance for AI inference at scale.

    2025-01-26 Tags: , , , , , , by klotz
  8. The article discusses the importance of fine-tuning machine learning models for optimal inference performance and explores popular tools like vLLM, TensorRT, ONNX Runtime, TorchServe, and DeepSpeed.

  9. Simon Willison explains how to use the mistral.rs library in Rust to run the Llama Vision model on a Mac M2 laptop. He provides a detailed example and discusses the memory usage and GPU utilization.

  10. TabbyAPI is a FastAPI based application that allows for generating text using an LLM (large language model) using the Exllamav2 backend. It supports various model types and features like HuggingFace model downloading, embedding model support, and more.

    2024-09-25 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "llm+inference"

About - Propulsed by SemanticScuttle