voice agents.
Four production voice agents on Retell AI. Over 10,000 inbound calls in the last month on the largest one. HIPAA-compliant patient follow-ups for a US medical clinic. Real estate lead qualification across web and inbound channels. Outbound campaign filtering against a 6,000-contact list. Built and maintained, not demoed.
- High-volume inbound voice agent
An inbound voice agent handling customer calls during off-hours and when human agents are unavailable. Over 10,000 calls in the last month.
- Real estate lead qualification voice agent
A voice agent deployed on the client's website and as an inbound call handler, qualifying leads and booking property viewings through natural conversation.
- Lead engagement voice agent
An outbound voice agent that worked a 6,000-contact lead list to surface who was actually worth the sales team's time.
- HIPAA-compliant voice agent for patient follow-up
A HIPAA-compliant voice agent handling patient follow-ups and appointment booking for a US medical clinic, integrated with DrChrono and GoHighLevel.
Retell AI is the voice surface on every agent I’ve shipped. n8n handles orchestration around the call: state, handoff, escalation, and the integration layer to whatever the client already runs on. For healthcare deployments, GoHighLevel handles the automation around call events and DrChrono is the appointment source of truth. For real estate, the listings layer is Supabase and the conversation reads the same data the public website reads. For outbound campaigns, n8n drives the list and the qualification routing.
None of this is fancy. The interesting engineering is at the seams between Retell and the systems the business actually runs on, not inside the voice layer itself. The voice layer is a commodity at this point. The integration discipline isn’t.
- latencyEvery additional millisecond between the caller’s last word and the agent’s first reply is a confidence tax. I tune the integration path for the common case to keep that delay under what humans expect from a phone call.
- handoffThe agent has to know when to stop talking and pass the conversation to a human. Handoff logic is half the work, especially in regulated cases. A confident wrong answer is worse than a clean handoff.
- toneThe agent is not trying to be a human. It’s trying to be useful, fast, and bounded. Most voice agents fail because they oversell the personality and undersell the structure.
- costPer-call cost compounds. I budget inference cost into the architecture from day one rather than discovering it at the client’s monthly invoice.
- regulationFor HIPAA and other regulated contexts, the architecture decisions are about what data the voice surface can see and where the recording boundary is. The voice agent doesn’t get to know things it shouldn’t.
How long does a production voice agent take to build?
About three weeks for the build, then another three weeks of active tuning after it goes live. The first phase is conversation design, integration wiring, and handoff logic. The second phase is daily transcript review against real callers, who never sound like the test scripts. Skipping the second phase is how voice agents end up embarrassing the people who deployed them.
Is Retell AI the right choice for production voice agents?
For most use cases I've shipped, yes. Retell handles latency, interruption, and turn-taking better than the alternatives I've evaluated, and the SDK lets me wire integrations cleanly to n8n, GoHighLevel, and custom CRMs. For regulated cases (HIPAA, financial), the architecture matters more than the platform — the platform just has to not get in the way.
Can voice agents be HIPAA-compliant?
Yes. The voice agent I built for a US medical clinic handles patient follow-ups and appointment booking through DrChrono, with GoHighLevel handling the automation around call events. The architecture decisions that matter most are the handoff logic when anything clinical comes up, the pacing around sensitive topics, and never exposing patient data outside the compliant surface.
What call volume can a single voice agent handle?
The inbound voice agent I built for a Multiskills IT client crossed 10,000 calls in the last month alone. Volume is rarely the engineering constraint; conversation quality at the long tail is. Tuning for the common 80% of calls is the easy part. The work is in the remaining 20% where the agent has to know when to gracefully hand off instead of trying to solve it alone.
What happens after the voice agent goes live?
It gets reviewed daily for the first three weeks. Real callers interrupt, mumble, get angry, ask things nobody wrote a handler for. The agent becomes actually good only if someone is watching transcripts in production, catching failure modes, and retuning. After the initial three-week shakedown, ongoing tuning settles into a weekly cadence as call patterns shift.
Building or scaling a voice agent? I take on a small number of voice-agent engagements at a time, scoped tight. The fastest way to start is a 20-minute call to see if the shape fits.