SemanticScuttle - klotz.me

Tags: vision*

0 bookmark(s) - Sort by: Date ↓ / Title /

A review of the Qwen2.5-VL-32B large language model, noting its performance, capabilities, and how it runs on a 64GB Mac. Includes a demonstration with a map image and performance statistics.

2025-03-26 Tags: vision, llm, qwen, simon willison by klotz

Introducing SenseCraft AI, Beginner-Friendly, Web-Based, No-Code Platform for AI Applications

SenseCraft AI is a free, web-based platform designed for beginners, focusing on a no-code approach and application-orientation to simplify and accelerate the creation of AI applications.

2025-01-17 Tags: arduino, esp32-s3, espcam, seeed studio, xiao, sensecraft ai, vision, cnn, deep learning by klotz

Configuring Model Output via MQTT on SenseCraft AI for XIAO ESP32S3

This article provides a step-by-step guide on how to configure model output using MQTT for the XIAO ESP32S3 Sense board on the SenseCraft AI platform.

2024-12-19 Tags: sensecraft, xiao, esp32s3, sense, mqtt, mqttx, seeedstudio, iot, vision by klotz

Chat with Your Images Using Llama 3.2-Vision Multimodal LLMs

Learn how to build Llama 3.2-Vision locally in a chat-like mode, and explore its Multimodal skills on a Colab notebook.

2024-12-08 Tags: llama 3.2-vision, multimodal, llm, vision, machine learning by klotz

Google DeepMind Just Released PaliGemma 2: A New Family of Open-Weight Vision Language Models (3B, 10B and 28B)

Google DeepMind introduced PaliGemma 2, a new family of Vision-Language Models with parameter sizes ranging from 3 billion to 28 billion, designed to address challenges in generalizing across different tasks and adapting to various input data types, including diverse image resolutions.

2024-12-06 Tags: paligemma 2, vision-language models, google, deepmind, vision, huggingface by klotz

How Our Brain Stabilizes Vision Amid Constant Eye Movement

A new study reveals how the brain compensates for rapid eye movements, maintaining a stable visual perception despite dynamic visual input. Researchers found that this stability mechanism breaks down for non-rigid motion like rotating vortices.

2024-11-08 Tags: vision, neuroscience, visual stability, saccades by klotz

Microsoft AI Releases OmniParser Model on HuggingFace: A Compact Screen Parsing Module that can Convert UI Screenshots into Structured Elements

Microsoft has released the OmniParser model on HuggingFace, a vision-based tool designed to parse UI screenshots into structured elements, enhancing intelligent GUI automation across platforms without relying on additional contextual data.

2024-10-26 Tags: microsoft, omniparser, huggingface, gui, automation, vision, user interfaces, llm by klotz

mistral.rs: Running Llama Vision on Mac M2

Simon Willison explains how to use the mistral.rs library in Rust to run the Llama Vision model on a Mac M2 laptop. He provides a detailed example and discusses the memory usage and GPU utilization.

2024-10-19 Tags: mistral.rs, llama, vision, rust, simon willison, llm, cli, inference by klotz

Llama 3.2 Guide: How It Works, Use Cases & More

Meta releases Llama 3.2, which features small and medium-sized vision LLMs (11B and 90B) alongside lightweight text-only models (1B and 3B). It also introduces the Llama Stack Distribution.

2024-09-29 Tags: llama 3.2, multimodal, vision, llm by klotz

License Plate Recognition Using ESP32-CAM

This project demonstrates how to use the ESP32-CAM to capture an image of a vehicle's license plate, send it to a cloud server for recognition, and display the recognized number plate on an OLED screen. The project includes setup instructions, code, and component details.

2024-09-23 Tags: esp32-cam, license plate, image recognition, machine learning, vision, iot by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: vision*

Linked Tags

Related Tags