0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag
SmolVLM is a compact, efficient multimodal model designed for tasks involving text and image inputs, producing text outputs. It is capable of answering questions about images, describing visual content, and functioning as a pure language model without visual inputs. Developed for on-device applications, SmolVLM is lightweight yet performs well in multimodal tasks.
First / Previous / Next / Last / Page 1 of 0