SemanticScuttle - klotz.me » klotz: vision-language model

klotz: vision-language model*

HumanVLM: Foundation for Human-Scene Vision-Language Model

This study introduces a domain-specific Large Vision-Language Model, Human-Scene Vision-Language Model (HumanVLM), designed to provide a foundation for human-scene Vision-Language tasks. They create a large-scale human-scene multimodal image-text dataset (HumanCaption-10M), develop a captioning approach for human-centered images, and train a HumanVLM.

2026-01-28 Tags: human, scene, multimodal, dataset, vision-language model, vlm by klotz

Qwen2.5-VL Technical Report

Qwen2.5-VL is a flagship model of the Qwen vision-language series, showcasing advancements in visual recognition, object localization, document parsing, and long-video comprehension. It introduces dynamic resolution processing and absolute time encoding, allowing it to handle complex inputs and maintain native resolution. Available in three sizes, it suits various applications from edge AI to high-performance computing, matching state-of-the-art models in document and diagram understanding while preserving strong linguistic capabilities.

2025-02-21 Tags: qwen2.5-vl, vision-language model, llm, huggingface, qwen, alibaba by klotz

Introducing Qwen2.5-VL: Advanced Vision-Language Model Capabilities

Qwen2.5-VL, the latest vision-language model from Qwen, showcases enhanced image recognition, agentic behavior, video comprehension, document parsing, and more. It outperforms previous models in various benchmarks and tasks, offering improved efficiency and performance.

2025-02-09 Tags: qwen2.5-vl, vision-language model, image recognition, document parsing, ocr, multimodal, llm, machine learning by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: vision-language model*

Linked Tags

Related Tags