High-performance deployment of the vLLM serving engine, optimized for serving large language models at scale.
This article discusses how to overcome limitations of retrieval-augmented generation (RAG) models by creating an AI assistant using advanced SQL vector queries. The author uses tools such as MyScaleDB, OpenAI, LangChain, Hugging Face and the HackerNews API to develop an application that enhances the accuracy and efficiency of data retrieval process.
A tutorial showing you how how to bring real-time data to LLMs through function calling, using OpenAI's latest LLM GTP-4o.
Function calling allows you to more reliably get structured data back from the model. For example, you can:
Create chatbots that answer questions by calling external APIs (e.g. like ChatGPT Plugins)
e.g. define functions like send_email(to: string, body: string), or get_current_weather(location: string, unit: 'celsius' | 'fahrenheit')
Convert natural language into API calls
e.g. convert "Who are my top customers?" to get_customers(min_revenue: int, created_before: string, limit: int) and call your internal API
Extract structured data from text
e.g. define a function called extract_data(name: string, birthday: string), or sql_query(query: string)
llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).