Tags: video* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.

  2. Qwen2.5-VL-3B-Instruct is the latest addition to the Qwen family of vision-language models by Hugging Face, featuring enhanced capabilities in understanding visual content and generating structured outputs. It is designed to directly interact with tools and use computer and phone functions as a visual agent. Qwen2.5-VL can comprehend videos up to an hour long and localize objects within images using bounding boxes or points. It is available in three sizes: 3, 7, and 72 billion parameters.

    2025-02-08 Tags: , , , , , , by klotz
  3. LLM 0.17 release enables multi-modal input, allowing users to send images, audio, and video files to Large Language Models like GPT-4o, Llama, and Gemini, with a Python API and cost-effective pricing.

    2024-10-29 Tags: , , , , , , , , by klotz
  4. The author records a screen capture of their Gmail account and uses Google Gemini to extract numeric values from the video.

  5. A tool to transcribe and summarize videos from multiple sources using AI models in Google Colab or locally.

    2024-10-06 Tags: , , , by klotz
  6. 2023-08-18 Tags: , , , , , by klotz
  7. 2023-04-21 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "video+llm"

About - Propulsed by SemanticScuttle