klotz: vlm*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A one stop repository for generative AI research updates, interview resources, notebooks and much more!
    2026-01-02 Tags: , , , , , by klotz
  2. A collection of Docker-based web user interfaces for running generative AI models locally.
    2025-12-17 Tags: , , , by klotz
  3. NVIDIA Nemotron Parse v1.1 is designed to understand document semantics and extract text and tables elements with spatial grounding. It transforms unstructured documents into actionable and machine-usable representations.
  4. This tutorial guides you through installing and using an inference snap, specifically Qwen 2.5 VL, a multi-modal large language model. It covers installation, status checks, basic chat, and configuring Open WebUI for image-based prompts.
  5. A guide to installing Open Genera 2.0, a Lisp environment originally from Symbolics, on a modern 64-bit Linux system. It details the necessary steps, including installing dependencies, setting up networking, and patching for compatibility.
  6. This article discusses how to apply vision language models (VLMs) to document understanding, covering application areas like agentic use cases, question answering, classification, and information extraction, as well as limitations like cost and processing long documents.
  7. An analysis of how well different AI systems perform in describing images and answering questions about them. The article compares ChatGPT, Gemini, Llama, and Claude using four images: a hand, a bottle of wine, a piece of pastry, and a flower.
  8. A Chrome extension using AI (LLaVa) to generate descriptive filenames for images when downloading them.
  9. SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.
  10. The Lucid Vision Extension integrates advanced vision models into textgen-webui, enabling contextualized conversations about images and direct communication with vision models.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: vlm

About - Propulsed by SemanticScuttle