Production AI Voice Agent — Anurag Singh Chauhan

A production voice agent holding real phone conversations across customer deployments — 100+ calls a day, sub-second latency, clean human handoff.

The latency problem

A voice agent lives or dies on latency. Anything over a second of dead air and the conversation feels broken — the caller starts talking over the agent, the turn-taking collapses. The hard part wasn’t any single component; it was the pipeline: audio in → transcription → reasoning → speech synthesis → audio out, all while the caller is still on the line.

I treated each hop as a latency budget. Streaming STT from Deepgram so transcription starts before the caller finishes speaking, LiveKit as the real-time transport so audio frames aren’t buffering, and streamed TTS from ElevenLabs so the agent starts talking before the full response is generated. Azure OpenAI handles the reasoning in between.

Handoff that doesn’t drop the ball

An AI agent that can’t gracefully escalate is worse than no agent. I built the handoff so that when the agent hits the edge of what it should handle, it transfers to a human with context — the human picks up knowing who’s calling and why, instead of starting cold.

Outcome

In production it handles 100+ calls a day. For one customer it took over 100% of inbound call routing across departments — the manual triage step simply went away.