Real-time voice · sub-second latency
Production AI Voice Agent
A production voice agent holding real phone conversations across customer deployments — 100+ calls a day, sub-second latency, clean human handoff.
- Twilio
- LiveKit
- Deepgram
- ElevenLabs
- Azure OpenAI
- TypeScript
Customers were drowning in inbound and outbound calls that needed routing, triage, and follow-up — all done manually. The goal was an AI agent that could hold a real phone conversation, act on it, and hand off to a human cleanly when needed.
Architected and shipped a production voice agent on Twilio for telephony, LiveKit as real-time middleware, Deepgram for speech-to-text, ElevenLabs for text-to-speech, and Azure OpenAI for reasoning. Tuned the pipeline for sub-second end-to-end latency and built seamless AI-to-human handoff so calls never dead-end.
- Handles 100+ inbound and outbound calls per day across customer deployments
- One customer automated 100% of inbound call routing across departments, eliminating manual triage
- Sub-second response latency for natural, real-time conversation
The latency problem
A voice agent lives or dies on latency. Anything over a second of dead air and the conversation feels broken — the caller starts talking over the agent, the turn-taking collapses. The hard part wasn’t any single component; it was the pipeline: audio in → transcription → reasoning → speech synthesis → audio out, all while the caller is still on the line.
I treated each hop as a latency budget. Streaming STT from Deepgram so transcription starts before the caller finishes speaking, LiveKit as the real-time transport so audio frames aren’t buffering, and streamed TTS from ElevenLabs so the agent starts talking before the full response is generated. Azure OpenAI handles the reasoning in between.
Handoff that doesn’t drop the ball
An AI agent that can’t gracefully escalate is worse than no agent. I built the handoff so that when the agent hits the edge of what it should handle, it transfers to a human with context — the human picks up knowing who’s calling and why, instead of starting cold.
Outcome
In production it handles 100+ calls a day. For one customer it took over 100% of inbound call routing across departments — the manual triage step simply went away.