SemanticScuttle - klotz.me » Tags: api+inference

Tags: api* + inference*

0 bookmark(s) - Sort by: Date ↓ / Title /

Solving the inference problem for open source AI projects with GitHub Models

This article discusses how GitHub Models provides a free, OpenAI-compatible inference API to make AI-powered open source software more accessible. It details the challenges of AI inference (cost, local resources, distribution) and how GitHub Models addresses them, including setup, CI/CD integration, and scaling.

2025-08-05 Tags: llm, github ci_cd, github models, inference, api, github actions by klotz

El Reg's essential guide to deploying LLMs in production

Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.

2025-04-28 Tags: llm, ai, production engineering, inference engineering, deployment, vllm, nvidia, kubernetes, inference, api, scaling, gpu, machine learning by klotz

Primer LLM Embedding

This Space demonstrates a simple method for embedding text using a LLM (Large Language Model) via the Hugging Face Inference API. It showcases how to convert text into numerical vector representations, useful for semantic search and similarity comparisons.

2025-03-28 Tags: llm, embedding, hugging face, inference, api, semantic search, vector representation, text embedding by klotz

Cerebras Inference Overview

The Cerebras API offers low-latency AI model inference using Cerebras Wafer-Scale Engines and CS-3 systems, providing access to Meta's Llama models for conversational applications.

2025-02-08 Tags: cerebras, api, llm, inference, cs-3 systems, saas by klotz

TabbyAPI - An OAI compatible exllamav2 API that's both lightweight and fast

TabbyAPI is a FastAPI based application that allows for generating text using an LLM (large language model) using the Exllamav2 backend. It supports various model types and features like HuggingFace model downloading, embedding model support, and more.

2024-09-25 Tags: llm, inference, tabbyapi, api, exllamav2 by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: api* + inference*

Linked Tags

Related Tags