LLM 0.17 release enables multi-modal input, allowing users to send images, audio, and video files to Large Language Models like GPT-4o, Llama, and Gemini, with a Python API and cost-effective pricing.
The author records a screen capture of their Gmail account and uses Google Gemini to extract numeric values from the video.