SemanticScuttle - klotz.me

Tags: image*

0 bookmark(s) - Sort by: Date ↓ / Title /

SSTV Capsule V2 for High Altitude Balloons

This project details building a system to send images in real-time from high altitude using SSTV (Slow-Scan Television) and PMR walkie-talkies. It covers hardware setup, code configuration, and launch results. Note: PMR is under ham radio license in US.

2025-04-01 Tags: sstv, high altitude balloon, esp32-cam, 446 mhz, rf, electronics, diy, ham radio, image, robot-36 by klotz

Search/ReSearch: Asking questions of images with AI?

An analysis of how well different AI systems perform in describing images and answering questions about them. The article compares ChatGPT, Gemini, Llama, and Claude using four images: a hand, a bottle of wine, a piece of pastry, and a flower.

2025-03-01 Tags: vlm, image description, chatgpt, gemini, llama, claude, image, dan russell by klotz

Qwen2.5-VL-3B-Instruct

Qwen2.5-VL-3B-Instruct is the latest addition to the Qwen family of vision-language models by Hugging Face, featuring enhanced capabilities in understanding visual content and generating structured outputs. It is designed to directly interact with tools and use computer and phone functions as a visual agent. Qwen2.5-VL can comprehend videos up to an hour long and localize objects within images using bounding boxes or points. It is available in three sizes: 3, 7, and 72 billion parameters.

2025-02-08 Tags: qwen2.5-vl, vlm, hugging face, image, video, llm, qwen by klotz

Deep Learning for Outlier Detection on Tabular and Image Data

Discussion on the challenges and promises of deep learning for outlier detection in various data modalities, including image and tabular data, with a focus on self-supervised learning techniques.

2025-01-04 Tags: deep learning, outlier detection, image, tabular data, production engineering by klotz

You can now run prompts against images, audio and video in your terminal using LLM

LLM 0.17 release enables multi-modal input, allowing users to send images, audio, and video files to Large Language Models like GPT-4o, Llama, and Gemini, with a Python API and cost-effective pricing.

2024-10-29 Tags: llm, simon willison, image, audio, video, gpt-4o, gemini, python, cli by klotz

MobileDiffusion: Rapid text-to-image generation on-device – Google Research Blog

2024-02-01 Tags: llm, text, image, mobile, google, diffusion by klotz

Now add a walrus: Prompt engineering in DALL-E 3

. The author experiments with the model, asking it to add a walrus to a prompt, and is surprised to find that the model can maintain consistency between images with a slightly altered prompt using a "seed" number. The author also delves into the underlying prompt engineering of DALL-E 3, revealing policies and guidelines that govern the model's image generation, including diversity and inclusivity guidelines.

2024-10-29 Tags: gpt-4, chatgpt, dall-e, image, prompt, llm, simon willison by klotz

Multi-Vector Retriever for RAG on tables, text, and images

2023-10-22 Tags: llm, rag, langchain, multi-modal, text, column, image by klotz

mediapipe Edge ML

2023-08-22 Tags: google, raspberry pi, ml, image, camera, sdk by klotz

GitHub - martinber/noaa-apt: NOAA APT weather satellite image decoder, for Linux, Windows, RPi 2+, OSX and Android+Termux