0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag
Alibaba Cloud released its Qwen2.5-Omni-7B multimodal AI model, designed for cost-effective AI agents and capable of processing various inputs like text, images, audio, and video.
Mistral Small 3.1 is an open-source multimodal AI model optimized for consumer hardware, offering strong performance in text and image processing, multilingual capabilities, and a balance between performance and accessibility. While excelling in many areas, it has limitations in long-context tasks and Middle Eastern language support.
Mistral Small 3.1 is a cutting-edge, open-source AI model released by Mistral AI, designed for efficiency and excelling in multimodal and multilingual tasks. It supports a 128k token context window and is optimized for real-time conversational AI and domain-specific fine-tuning.
Qwen2.5-VL, the latest vision-language model from Qwen, showcases enhanced image recognition, agentic behavior, video comprehension, document parsing, and more. It outperforms previous models in various benchmarks and tasks, offering improved efficiency and performance.
Learn how to build Llama 3.2-Vision locally in a chat-like mode, and explore its Multimodal skills on a Colab notebook.
This article discusses the development of multimodal Retrieval Augmented Generation (RAG) systems which allow for the processing of various file types using AI. The article provides a beginner-friendly guide with example Python code and explains the three levels of multimodal RAG systems.
SmolVLM is a compact, efficient multimodal model designed for tasks involving text and image inputs, producing text outputs. It is capable of answering questions about images, describing visual content, and functioning as a pure language model without visual inputs. Developed for on-device applications, SmolVLM is lightweight yet performs well in multimodal tasks.
Meta releases Llama 3.2, which features small and medium-sized vision LLMs (11B and 90B) alongside lightweight text-only models (1B and 3B). It also introduces the Llama Stack Distribution.
Explores recent trends in LLM research, including multi-modal LLMs, open-source LLMs, domain-specific LLMs, LLM agents, smaller LLMs, and Non-Transformer LLMs. Mentions examples such as OpenAI's Sora, LLM360, BioGPT, StarCoder, and Mamba.
This article provides a step-by-step guide on fine-tuning the Florence-2 model for object detection tasks, including loading the pre-trained model, fine-tuning with a custom dataset, and evaluating the model's performance.
First / Previous / Next / Last
/ Page 1 of 0