SemanticScuttle - klotz.me » Tags: llm+inference

Tags: llm* + inference*

0 bookmark(s) - Sort by: Date ↓ / Title /

This Space demonstrates a simple method for embedding text using a LLM (Large Language Model) via the Hugging Face Inference API. It showcases how to convert text into numerical vector representations, useful for semantic search and similarity comparisons.

2025-03-28 Tags: llm, embedding, hugging face, inference, api, semantic search, vector representation, text embedding by klotz

NVIDIA DGX Spark

NVIDIA DGX Spark is a desktop-friendly AI supercomputer powered by the NVIDIA GB10 Grace Blackwell Superchip, delivering 1000 AI TOPS of performance with 128GB of memory. It is designed for prototyping, fine-tuning, and inference of large AI models.

2025-03-24 Tags: machine learning, nvidia, dgx spark, llm, grace blackwell, ai development, inference, data science, gpu, cpu by klotz

DeepSeek-R1-beating perf in a 32B package? El Reg digs its claws into Alibaba's QwQ

Alibaba's Qwen team aims to find out with its latest release, QwQ. Despite having a fraction of DeepSeek R1's claimed 671 billion parameters, Alibaba touts its comparatively compact 32-billion 'reasoning' model as outperforming R1 in select math, coding, and function-calling benchmarks.

2025-03-17 Tags: alibaba, inference, llm, qwq, deepseek, r1, reasoning by klotz

NVIDIA Jetson Orin Nano Super: Powering DeepSeek R1 70B Inference at the Edge!

The NVIDIA Jetson Orin Nano Super is highlighted as a compact, powerful computing solution for edge AI applications. It enables sophisticated AI capabilities at the edge, supporting large-scale inference tasks with the help of high-capacity storage solutions like the Solidigm 122.88TB SSD. This review explores its use in various applications including wildlife conservation, surveillance, and AI model distribution, emphasizing its potential in real-world deployments.

2025-02-20 Tags: nvidia, jetson, orin nano super, deepseek r1 70b, llm, inference, iot, edge, hardware by klotz

Cerebras Inference Overview

The Cerebras API offers low-latency AI model inference using Cerebras Wafer-Scale Engines and CS-3 systems, providing access to Meta's Llama models for conversational applications.

2025-02-08 Tags: cerebras, api, llm, inference, cs-3 systems, saas by klotz

DDR5 Speed, CPU and LLM Inference

Investigation into the effect of DDR5 speed on local LLM inference speed.

2025-01-26 Tags: llm, machine learning, inference, performance, memory, ddr5 by klotz

Introducing Corsair, the world’s most efficient AI inference platform for datacenters

d-Matrix is transforming the economics of large-scale inference with Corsair, a platform that delivers blazing fast, commercially viable, and sustainable performance for AI inference at scale.

2025-01-26 Tags: d-matrix, corsair, llm, inference, ram, sram, ddr5 by klotz

LLM Tools by Examples: Exploring Tools for Optimal Inference Performance

The article discusses the importance of fine-tuning machine learning models for optimal inference performance and explores popular tools like vLLM, TensorRT, ONNX Runtime, TorchServe, and DeepSpeed.

2025-01-02 Tags: llm, inference, performance, vllm, tensorrt, onnx, torchserve, deepspeed by klotz

mistral.rs: Running Llama Vision on Mac M2

Simon Willison explains how to use the mistral.rs library in Rust to run the Llama Vision model on a Mac M2 laptop. He provides a detailed example and discusses the memory usage and GPU utilization.

2024-10-19 Tags: mistral.rs, llama, vision, rust, simon willison, llm, cli, inference by klotz

TabbyAPI - An OAI compatible exllamav2 API that's both lightweight and fast

TabbyAPI is a FastAPI based application that allows for generating text using an LLM (large language model) using the Exllamav2 backend. It supports various model types and features like HuggingFace model downloading, embedding model support, and more.

2024-09-25 Tags: llm, inference, tabbyapi, api, exllamav2 by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: llm* + inference*

Linked Tags

Related Tags