The actual definition
A voice agent is software that answers phone calls (or makes them) using an LLM as the conversation brain. Three pieces always fit together:
- ASR — Automatic Speech Recognition. Turns the caller's audio into text. Deepgram, Whisper, AssemblyAI.
- LLM — A language model. GPT-4o, Claude, etc. Reads the text and produces a response.
- TTS — Text-to-speech. Turns the response back into audio. ElevenLabs, Cartesia, OpenAI TTS.
A voice-agent platform (Vapi, Retell, Bland) is the orchestrator that wires these together with a phone provider (Twilio).
The three things voice agents are actually good at
- Answering after-hours. The bar to beat is "voicemail." A halfway-decent voice agent that captures the caller's intent and the basics is already a 10× win.
- Qualifying leads. "What service are you calling about, when do you need it, what's your zip?" Three questions, structured output, off to the CRM.
- Booking appointments. Read available slots from a calendar, suggest two, confirm one, send the calendar invite.
The three things they're bad at
- Free-form complaints. Anything emotional, anything requiring real empathy, anything where the caller is angry. Route to a human.
- High-stakes diagnostics. "My AC is making a weird noise" — the agent can capture the symptom, but it should not be diagnosing or quoting.
- Anything legally regulated. Medical advice, legal advice, financial advice. Hard stop.
Where this course is going
Over the next 11 lessons you'll build a real voice agent on Vapi that:
- Answers calls on a real phone number
- Qualifies inbound leads with a 4-question script
- Books appointments against your Cal.com calendar
- Pushes transcripts into a Supabase table
- Triggers an SMS follow-up after every call