0 bookmark(s) - Sort by: Date ↓ / Title /
Qwen2.5-VL, the latest vision-language model from Qwen, showcases enhanced image recognition, agentic behavior, video comprehension, document parsing, and more. It outperforms previous models in various benchmarks and tasks, offering improved efficiency and performance.
Learn how to build Llama 3.2-Vision locally in a chat-like mode, and explore its Multimodal skills on a Colab notebook.
This article discusses the development of multimodal Retrieval Augmented Generation (RAG) systems which allow for the processing of various file types using AI. The article provides a beginner-friendly guide with example Python code and explains the three levels of multimodal RAG systems.
SmolVLM is a compact, efficient multimodal model designed for tasks involving text and image inputs, producing text outputs. It is capable of answering questions about images, describing visual content, and functioning as a pure language model without visual inputs. Developed for on-device applications, SmolVLM is lightweight yet performs well in multimodal tasks.
Meta releases Llama 3.2, which features small and medium-sized vision LLMs (11B and 90B) alongside lightweight text-only models (1B and 3B). It also introduces the Llama Stack Distribution.
Explores recent trends in LLM research, including multi-modal LLMs, open-source LLMs, domain-specific LLMs, LLM agents, smaller LLMs, and Non-Transformer LLMs. Mentions examples such as OpenAI's Sora, LLM360, BioGPT, StarCoder, and Mamba.
This article provides a step-by-step guide on fine-tuning the Florence-2 model for object detection tasks, including loading the pre-trained model, fine-tuning with a custom dataset, and evaluating the model's performance.
The Pipe is a multimodal-first tool for feeding files and web pages into vision-language models such as GPT-4V. It is best for LLM and RAG applications that want to support comprehensive textual and visual understanding across a wide range of data sources. The Pipe is available as a 24/7 hosted API at thepi.pe, or it can be set up locally to let you run the compute.
First / Previous / Next / Last
/ Page 1 of 0