VITALIFY.ASIA logo

I built a voice agent that turns off the lights — here's what I learned

Author profile
Cu Cong Can10/29/2025
I built a voice agent that turns off the lights — here's what I learned

The term "AI Agent" 🤖 gets thrown around a lot lately. But what does it actually mean?

An agent isn't just a chatbot. It's an AI model that can reason, plan, and act — not just generate text, but use tools to interact with the real world. Think: observe a situation → decide what to do → execute an action.

When Google released Gemini 2.5 Flash Native Audio earlier this year, I saw a real opportunity to test this idea.


🧠 What makes Gemini 2.5 Flash Native Audio different

Most voice AI systems work by chaining together separate models — speech-to-text, an LLM, then text-to-speech. Latency adds up fast, and the voice often sounds robotic.

Gemini's native audio stream changes that: it's a single model handling everything end-to-end, with sub-300ms response times and a voice that genuinely sounds natural. It's built for real-time dialogue, not just playback.


💡 The experiment: voice control for smart lights

I built a small prototype using the Gemini Live API — a voice agent that can listen, reason, and take real-world actions. Here's what it can do:

  • Search the internet — answers live questions by calling a web search tool
  • Control smart lights — turns lights on or off via a Raspberry Pi running a local HTTP API

The light control is simple but satisfying. When the agent decides to act, it fires a POST request:

POST http://raspberrypi.local/api/v1/pin/11/on
POST http://raspberrypi.local/api/v1/pin/11/off

The loop is: hear the command → decide which tool to call → execute → respond in natural voice. No buttons, no app — just speech.

🎬 Demo Video

Feels a Bit Like Talking to JARVIS

There's something oddly cinematic about hearing a calm AI voice respond instantly and watching the lights obey. It reminded me of the Iron Man scene

“Wake up. Daddy’s home.”

We're not at JARVIS yet. But the gap between conversation and action feels smaller than it did six months ago.


What's next

This is an early prototype. I'm interested in deeper integrations — internal tools, richer emotional voice control, more complex reasoning chains. If you're building in this space or want to try the demo, feel free to reach out.

Reference

Struggling to turn ideas into reality? With a proven track record of over 1,000 clients, our agile and flexible team will accelerate your business growth.

Book a Free Consultation
#AI Agent

More on "AI Agent"

AIエージェント「Duper」|2D/3DアバターがWebサイトで顧客対応

AIエージェント「Duper」|2D/3DアバターがWebサイトで顧客対応

Nihei Tomotaka03/13/2025

「Duper」は、Webサイト上を駆け巡る2D/3Dアバターが、RAG技術を用いて自動で顧客対応を行うAIエージェントです。

画像生成AI(Leonardo.ai)でモブキャラのサムネイルを生成する

画像生成AI(Leonardo.ai)でモブキャラのサムネイルを生成する

Toshihiko Nagaoka06/25/2023

画像生成AI「Leonardo.ai」を活用し、ゲームに登場するモブキャラクターのサムネイルを自動生成する検証。

I'm Duper, ask me anything!