The author explores creating a privacy-focused AI concierge for a Reolink video doorbell using locally hosted tools. By integrating Home Assistant with Piper for text-to-speech, Whisper for speech-to-text, and Ollama to run local large language models, the project aimed to automate interactions with visitors when no one is home. Although real-time two-way conversations were hindered by hardware performance and model latency, a functional system was developed that transcribes visitor messages and sends them as notifications to the owner's phone.
Main points:
Implementing local AI in smart home devices for privacy
Using Home Assistant to orchestrate TTS, STT, and LLM components
Overcoming hardware bottlenecks in real-time speech processing
Automating visitor message transcription and mobile notifications