klotz: vision* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth! This page details running Gemma 3 on various platforms, including phones, and fine-tuning it using Unsloth, addressing potential issues with float16 precision and providing optimal configuration settings.
  2. Learn how to run and fine-tune Mistral Devstral 1.1, including Small-2507 and 2505. This guide covers official recommended settings, tutorials for running Devstral in Ollama and llama.cpp, experimental vision support, and fine-tuning with Unsloth.
  3. A summary of a workshop presented at PyCon US on building software with LLMs, covering setup, prompting, building tools (text-to-SQL, structured data extraction, semantic search/RAG), tool usage, and security considerations like prompt injection. It also discusses the current LLM landscape, including models from OpenAI, Gemini, Anthropic, and open-weight alternatives.
  4. This article details a new plugin, llm-video-frames, that allows users to feed video files into long context vision LLMs (like GPT-4.1) by converting them into a sequence of JPEG frames. It showcases how to install and use the plugin, provides examples with the Cleo video, and discusses the cost and technical details of the process. It also covers the development of the plugin using an LLM and highlights other features in LLM 0.25.
    2025-05-06 Tags: , , , , , by klotz
  5. A review of the Qwen2.5-VL-32B large language model, noting its performance, capabilities, and how it runs on a 64GB Mac. Includes a demonstration with a map image and performance statistics.
    2025-03-26 Tags: , , , by klotz
  6. Learn how to build Llama 3.2-Vision locally in a chat-like mode, and explore its Multimodal skills on a Colab notebook.
  7. Microsoft has released the OmniParser model on HuggingFace, a vision-based tool designed to parse UI screenshots into structured elements, enhancing intelligent GUI automation across platforms without relying on additional contextual data.
  8. Simon Willison explains how to use the mistral.rs library in Rust to run the Llama Vision model on a Mac M2 laptop. He provides a detailed example and discusses the memory usage and GPU utilization.
  9. Meta releases Llama 3.2, which features small and medium-sized vision LLMs (11B and 90B) alongside lightweight text-only models (1B and 3B). It also introduces the Llama Stack Distribution.
    2024-09-29 Tags: , , , by klotz
  10. MLX-VLM: A package for running Vision LLMs on Mac using MLX.
    2024-09-10 Tags: , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: vision + llm

About - Propulsed by SemanticScuttle