Tags: audio*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Google DeepMind has released the Gemma 4 12B, a dense multimodal model featuring an encoder-free architecture. Unlike previous iterations that used separate vision and audio encoders, this model allows these modalities to flow directly into the LLM backbone. This streamlined design reduces latency and memory overhead, allowing the model to perform agentic reasoning tasks on consumer laptops with as little as 16 GB of VRAM while approaching the performance levels of much larger models like the 26B MoE variant.

    - Unified decoder-only architecture for text, image, video, and native audio input.
    - Encoder-free design using a 35M vision embedder and direct raw audio wave projection.
    - Optimized to run locally on Apple Silicon Macs and consumer GPU laptops.
    - Released under an Apache 2.0 license with support for llama.cpp, MLX, vLLM, and Ollama.
  2. This article provides a guide on building a DIY high-end Bluetooth speaker in only 30 minutes using an ESP32 and simple hobbyist components. Using a LILYGO T-Display board, which supports the Bluetooth Classic protocol required for A2DP audio streaming, the project pairs with a MAX98357A module to act as a digital-to-analog converter and amplifier. The setup uses the I2S interface for audio transmission and can even display track information and connection status on the onboard TFT screen. It is an excellent, quick project for electronics enthusiasts looking to repurpose spare parts into a functional, polished audio device.
  3. Bring sound and visual expression to your Raspberry Pi and unleash unlimited creativity! Whisplay HAT is a versatile expansion board integrating high-fidelity audio, 1.96-inch color LCD screen, RGB indicator lights, and programmable buttons.
  4. This page details how to control shopping cart wheels using audio signals from a phone, exploiting the 7.8 kHz signal used for locking/unlocking. It provides audio files to lock, unlock, arm, and perform purchase checks on Gatekeeper Systems and Rocateq wheels.
  5. Nathan Ladwig has got the ESP32 decoding SPDIF quite effectively, using an onboard peripheral outside its traditional remit. The project allows an ESP32 to work as a USB audio device or take an S/PDIF signal as input, and then transmitting that audio stream over RTP.
    2025-10-06 Tags: , , , , , by klotz
  6. This repository contains the source code for the summarize-and-chat project. This project provides a unified document summarization and chat framework with LLMs, aiming to address the challenges of building a scalable solution for document summarization while facilitating natural language interactions through chat interfaces.
  7. How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth! This page details running Gemma 3 on various platforms, including phones, and fine-tuning it using Unsloth, addressing potential issues with float16 precision and providing optimal configuration settings.
  8. This article details how to rip CDs using the abcde command-line tool in Linux, providing instructions, options, and a sample configuration file for efficient and high-quality audio ripping.
  9. This article details a dsPIC-driven radio build by Minh Danh that receives FM and AM (all bands, SW, MW, LW), plays MP3s from USB, and records incoming signals. It highlights the use of Si4735 for radio functionality, TDA1308 and NS8002 for audio, and a resistor ladder for front panel button input. The project is praised for its thorough documentation.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "audio"

About - Propulsed by SemanticScuttle