Tags: edge computing* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. PrismML, a venture originating from Caltech, has introduced its new 1-bit large language model, Bonsai 8B, designed to significantly enhance AI efficiency on edge hardware. This innovative model architecture represents weights using only their sign and a shared scale factor, resulting in a memory footprint of just 1.15 GB. Compared to full-precision models, Bonsai 8B is 14 times smaller, 8 times faster, and 5 times more energy-efficient, while maintaining competitive performance. By drastically reducing memory and power requirements, PrismML aims to enable advanced AI applications on mobile devices, real-time robotics, and secure enterprise systems, effectively moving powerful language models out of massive cloud datacenters and onto local hardware.
  2. NVIDIA has launched the Gemma 4 model family, designed to operate efficiently across a wide range of hardware, from data centers to edge devices like Jetson. This new generation includes the first Gemma MoE model and supports over 140 languages, enabling advanced capabilities like reasoning, code generation, and multimodal input.
    Developers can fine-tune and deploy Gemma 4 using tools like NeMo Automodel and NVIDIA NIM, with commercial licensing available. The models are optimized for local deployment with frameworks such as vLLM, Ollama, and llama.cpp, offering flexibility for various use cases, including robotics, smart machines, and secure on-premise applications.
    2026-04-03 Tags: , , , , , , by klotz
  3. Orange Pi has announced the Orange Pi AI Station, a compact edge computing platform featuring the Ascend 310 processor, offering up to 176 TOPS of AI compute performance with options for up to 96GB of LPDDR4X memory and NVMe storage.
  4. This article details how to build a fast, offline AI chatbot using a Raspberry Pi 5, RLM AA50 accelerator card, and optimization techniques for speech recognition, natural language processing, and text-to-speech tasks.
  5. This paper proposes SkyMemory, a LEO satellite constellation hosted key-value cache (KVC) to accelerate transformer-based inference, particularly for large language models (LLMs). It explores different chunk-to-server mapping strategies (rotation-aware, hop-aware, and combined) and presents simulation results and a proof-of-concept implementation demonstrating performance improvements.
  6. The article introduces the concept of Federated Language Models, combining edge-based Small Language Models (SLMs) with cloud-based Large Language Models (LLMs) for enhanced privacy and performance in AI applications.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "edge computing+llm"

About - Propulsed by SemanticScuttle