I built a voice agent that turns off the lights — here's what I learned

The term "AI Agent" 🤖 gets thrown around a lot lately. But what does it actually mean?
An agent isn't just a chatbot. It's an AI model that can reason, plan, and act — not just generate text, but use tools to interact with the real world. Think: observe a situation → decide what to do → execute an action.
When Google released Gemini 2.5 Flash Native Audio earlier this year, I saw a real opportunity to test this idea.
🧠 What makes Gemini 2.5 Flash Native Audio different
Most voice AI systems work by chaining together separate models — speech-to-text, an LLM, then text-to-speech. Latency adds up fast, and the voice often sounds robotic.
Gemini's native audio stream changes that: it's a single model handling everything end-to-end, with sub-300ms response times and a voice that genuinely sounds natural. It's built for real-time dialogue, not just playback.
💡 The experiment: voice control for smart lights
I built a small prototype using the Gemini Live API — a voice agent that can listen, reason, and take real-world actions. Here's what it can do:
- Search the internet — answers live questions by calling a web search tool
- Control smart lights — turns lights on or off via a Raspberry Pi running a local HTTP API
The light control is simple but satisfying. When the agent decides to act, it fires a POST request:
POST http://raspberrypi.local/api/v1/pin/11/on
POST http://raspberrypi.local/api/v1/pin/11/offThe loop is: hear the command → decide which tool to call → execute → respond in natural voice. No buttons, no app — just speech.
🎬 Demo Video
Feels a Bit Like Talking to JARVIS
There's something oddly cinematic about hearing a calm AI voice respond instantly and watching the lights obey. It reminded me of the Iron Man scene
“Wake up. Daddy’s home.”
We're not at JARVIS yet. But the gap between conversation and action feels smaller than it did six months ago.
What's next
This is an early prototype. I'm interested in deeper integrations — internal tools, richer emotional voice control, more complex reasoning chains. If you're building in this space or want to try the demo, feel free to reach out.
Reference
Struggling to turn ideas into reality? With a proven track record of over 1,000 clients, our agile and flexible team will accelerate your business growth.
Book a Free Consultation

