Tags: vlm*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. M5Stack has launched the AI-88502 LLM Accelerator M.2 Kit, based on the LLM-8850 M.2 card with a 24 TOPS Axera AX8850 SoC, offering an alternative to the Raspberry Pi AI HAT+ 2 for LLM and AI vision workloads.
  2. This study introduces a domain-specific Large Vision-Language Model, Human-Scene Vision-Language Model (HumanVLM), designed to provide a foundation for human-scene Vision-Language tasks. They create a large-scale human-scene multimodal image-text dataset (HumanCaption-10M), develop a captioning approach for human-centered images, and train a HumanVLM.
  3. A one stop repository for generative AI research updates, interview resources, notebooks and much more!
    2026-01-02 Tags: , , , , , by klotz
  4. A collection of Docker-based web user interfaces for running generative AI models locally.
    2025-12-17 Tags: , , , by klotz
  5. NVIDIA Nemotron Parse v1.1 is designed to understand document semantics and extract text and tables elements with spatial grounding. It transforms unstructured documents into actionable and machine-usable representations.
  6. This tutorial guides you through installing and using an inference snap, specifically Qwen 2.5 VL, a multi-modal large language model. It covers installation, status checks, basic chat, and configuring Open WebUI for image-based prompts.
  7. A guide to installing Open Genera 2.0, a Lisp environment originally from Symbolics, on a modern 64-bit Linux system. It details the necessary steps, including installing dependencies, setting up networking, and patching for compatibility.
  8. This article discusses how to apply vision language models (VLMs) to document understanding, covering application areas like agentic use cases, question answering, classification, and information extraction, as well as limitations like cost and processing long documents.
  9. An analysis of how well different AI systems perform in describing images and answering questions about them. The article compares ChatGPT, Gemini, Llama, and Claude using four images: a hand, a bottle of wine, a piece of pastry, and a flower.
  10. A Chrome extension using AI (LLaVa) to generate descriptive filenames for images when downloading them.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "vlm"

About - Propulsed by SemanticScuttle