This tutorial guides you through installing and using an inference snap, specifically Qwen 2.5 VL, a multi-modal large language model. It covers installation, status checks, basic chat, and configuring Open WebUI for image-based prompts.
This paper introduces a computational framework for recovering spectral information from single-shot photos using a specially designed color chart and algorithm, achieving spectral resolution comparable to scientific spectrometers. It eliminates the need for training data or pre-trained models and has potential applications in accessible optical spectroscopy and hyperspectral imaging.
This book covers foundational topics within computer vision, with an image processing and machine learning perspective. It aims to build the reader’s intuition through visualizations and is intended for undergraduate and graduate students, as well as experienced practitioners.
This document details a custom OCR program designed for recovering old computer programs from line-printer listings. It focuses on accuracy for mono-spaced fonts, even at the cost of speed, and outlines the algorithm, implementation details, and necessary preparation steps.
Anthropic's new feature allows specifying a public URL for images/documents in their API, improving performance and usability. The article details implementation and successful testing with Claude 3.7 Sonnet.
A Chrome extension using AI (LLaVa) to generate descriptive filenames for images when downloading them.
The AI Camera is a portable, low-power device that combines a Raspberry Pi Zero, a 2MP camera module, and a range of sensors, designed for capturing and processing images locally with AI capabilities.
This article explores how to incorporate images into a RAG (Retrieval-Augmented Generation) knowledgebase using Large Language Models (LLMs) with vision capabilities. It provides a step-by-step guide to collecting, uploading, and transcribing images for a richer and more detailed knowledgebase.
This article explains how to run inference on a YOLOv8 object detection model using Docker and create a REST API to orchestrate the process. It includes code implementation and a detailed README in the author's GitHub repository for running the API via REST with Docker.
We introduce LayoutLM, one of the renowned models for extracting information from documents, developed by Microsoft. To tailor a solution for our specific needs, we label our documents using Label Studio, an open-source labeling tool, connected to our remote storage AWS S3.