Tags: multimodal*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Learn how to build Llama 3.2-Vision locally in a chat-like mode, and explore its Multimodal skills on a Colab notebook.
  2. This article discusses the development of multimodal Retrieval Augmented Generation (RAG) systems which allow for the processing of various file types using AI. The article provides a beginner-friendly guide with example Python code and explains the three levels of multimodal RAG systems.
    2024-12-07 Tags: , , , , by klotz
  3. SmolVLM is a compact, efficient multimodal model designed for tasks involving text and image inputs, producing text outputs. It is capable of answering questions about images, describing visual content, and functioning as a pure language model without visual inputs. Developed for on-device applications, SmolVLM is lightweight yet performs well in multimodal tasks.
    2024-11-28 Tags: , , , , by klotz
  4. Meta releases Llama 3.2, which features small and medium-sized vision LLMs (11B and 90B) alongside lightweight text-only models (1B and 3B). It also introduces the Llama Stack Distribution.
    2024-09-29 Tags: , , , by klotz
  5. Explores recent trends in LLM research, including multi-modal LLMs, open-source LLMs, domain-specific LLMs, LLM agents, smaller LLMs, and Non-Transformer LLMs. Mentions examples such as OpenAI's Sora, LLM360, BioGPT, StarCoder, and Mamba.
  6. This article provides a step-by-step guide on fine-tuning the Florence-2 model for object detection tasks, including loading the pre-trained model, fine-tuning with a custom dataset, and evaluating the model's performance.
  7. The Pipe is a multimodal-first tool for feeding files and web pages into vision-language models such as GPT-4V. It is best for LLM and RAG applications that want to support comprehensive textual and visual understanding across a wide range of data sources. The Pipe is available as a 24/7 hosted API at thepi.pe, or it can be set up locally to let you run the compute.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "multimodal"

About - Propulsed by SemanticScuttle