The M.2 Max is an AI inference acceleration card powered by the Metis AIPU, designed to enable Large Language Models (LLMs) and Vision Language Models (VLMs) on power-constrained edge and embedded devices. It offers high memory performance in a small footprint and supports complex computer vision tasks using parallel or cascaded models.
Key features include:
- Memory capacities up to 16 GB with various cooling options.
- Support for standard and extended operating temperature ranges.
- Hardware Root-of-Trust for secure boot and firmware integrity.
- Integration via the Voyager SDK and advanced quantization tools.
- Compatibility with PCIe Gen. 3.0 x4, Intel, AMD, and Arm64 processors across Linux and Windows environments.
IBM has introduced Granite 4.0 3B Vision, a specialized vision-language model (VLM) engineered for high-fidelity enterprise document data extraction. Unlike monolithic multimodal models, this release uses a modular LoRA adapter architecture, adding approximately 0.5B parameters to the Granite 4.0 Micro base model. This design allows for efficient dual-mode deployment, activating vision capabilities only when multimodal processing is required. The model excels at converting complex visual elements, such as charts and tables, into structured machine-readable formats like JSON, HTML, and CSV. By utilizing a high-resolution tiling mechanism and a DeepStack architecture for improved spatial alignment, Granite 4.0 3B Vision achieves impressive accuracy in tasks like Key-Value Pair extraction and chart reasoning, ranking highly on industry benchmarks.
"The article discusses the evolution of manufacturing beyond 'smart' to an AI-driven future. It argues that while smart manufacturing focused on connectivity and data collection, AI will unlock true transformation by enabling predictive maintenance, optimized supply chains, and personalized product development. The piece outlines ten specific use cases where AI is poised to make a significant impact, including generative design, digital twins, and autonomous quality control. It emphasizes the shift from reactive problem-solving to proactive optimization, ultimately leading to increased efficiency, reduced costs, and improved product quality. The author posits that AI is not just enhancing manufacturing, but fundamentally reshaping it."
M5Stack has launched the AI-88502 LLM Accelerator M.2 Kit, based on the LLM-8850 M.2 card with a 24 TOPS Axera AX8850 SoC, offering an alternative to the Raspberry Pi AI HAT+ 2 for LLM and AI vision workloads.
This study introduces a domain-specific Large Vision-Language Model, Human-Scene Vision-Language Model (HumanVLM), designed to provide a foundation for human-scene Vision-Language tasks. They create a large-scale human-scene multimodal image-text dataset (HumanCaption-10M), develop a captioning approach for human-centered images, and train a HumanVLM.
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
A collection of Docker-based web user interfaces for running generative AI models locally.
NVIDIA Nemotron Parse v1.1 is designed to understand document semantics and extract text and tables elements with spatial grounding. It transforms unstructured documents into actionable and machine-usable representations.
This tutorial guides you through installing and using an inference snap, specifically Qwen 2.5 VL, a multi-modal large language model. It covers installation, status checks, basic chat, and configuring Open WebUI for image-based prompts.
A guide to installing Open Genera 2.0, a Lisp environment originally from Symbolics, on a modern 64-bit Linux system. It details the necessary steps, including installing dependencies, setting up networking, and patching for compatibility.