LiteLLM is a library to deploy and manage LLM (Large Language Model) APIs using a standardized format. It supports multiple LLM providers, includes proxy server features for load balancing and cost tracking, and offers various integrations for logging and observability.
This page provides documentation for the rerank API, including endpoints, request parameters, and response formats.
This repository contains the Llama Stack API specifications as well as API Providers and Llama Stack Distributions. The Llama Stack aims to standardize the building blocks needed for generative AI applications across various development stages.
It includes API specifications and providers for the Llama Stack, which aims to standardize components needed for developing generative AI applications. The stack includes APIs for Inference, Safety, Memory, Agentic System, Evaluation, Post Training, Synthetic Data Generation, and Reward Scoring. Providers offer actual implementations for these APIs, either through open-source libraries or remote REST services.
TabbyAPI is a FastAPI based application that allows for generating text using an LLM (large language model) using the Exllamav2 backend. It supports various model types and features like HuggingFace model downloading, embedding model support, and more.
Airbyte, a data integration platform, has introduced a feature allowing AI to automatically create API connectors by reading documentation, enhancing its low-code/no-code capabilities and supporting the growing demand for robust data services to power AI initiatives in enterprises.
High-performance deployment of the vLLM serving engine, optimized for serving large language models at scale.
Hugging Face introduces a unified tool use API across multiple model families, making it easier to implement tool use in language models.
Hugging Face has extended chat templates to support tools, offering a unified approach to tool use with the following features:
- Defining tools: Tools can be defined using JSON schema or Python functions with clear names, accurate type hints, and complete docstrings.
- Adding tool calls to the chat: Tool calls are added as a field of assistant messages, including the tool type, name, and arguments.
- Adding tool responses to the chat: Tool responses are added as tool messages containing the tool name and content.
OnDemand AI provides API services for media, services, and plugins, allowing developers to upload media, use NLP, and deploy machine learning models. It also facilitates serverless application deployment and allows BYOM (Bring Your Own Model) and BYOI (Bring Your Own Inference).
This article discusses how to overcome limitations of retrieval-augmented generation (RAG) models by creating an AI assistant using advanced SQL vector queries. The author uses tools such as MyScaleDB, OpenAI, LangChain, Hugging Face and the HackerNews API to develop an application that enhances the accuracy and efficiency of data retrieval process.
Learn how to build an open LLM app using Hermes 2 Pro, a powerful LLM based on Meta's Llama 3 architecture. This tutorial explains how to deploy Hermes 2 Pro locally, create a function to track flight status using FlightAware API, and integrate it with the LLM.