SemanticScuttle - klotz.me » Tags: vision-language model+human

HumanVLM: Foundation for Human-Scene Vision-Language Model

This study introduces a domain-specific Large Vision-Language Model, Human-Scene Vision-Language Model (HumanVLM), designed to provide a foundation for human-scene Vision-Language tasks. They create a large-scale human-scene multimodal image-text dataset (HumanCaption-10M), develop a captioning approach for human-centered images, and train a HumanVLM.

2026-01-28 Tags: human, scene, multimodal, dataset, vision-language model, vlm by klotz

SemanticScuttle - klotz.me

Tags: vision-language model* + human*

Linked Tags

Related Tags