Voice Interface Design Patterns
Voice interfaces break most of the assumptions text chat makes. Responses need to be two or three sentences, not paragraphs; users will interrupt mid-reply; and screen-based feedback shifts to audio cues plus a pulsing-orb affordance. This lesson covers the patterns unique to voice.
Voice-Specific Design Principles
- Keep responses short - 2-3 sentences max. If complex, chunk it: "Here's a quick answer. Want me to go deeper?"
- Confirm before acting - In voice, the AI hears what it hears. For destructive actions: "I'll delete the Monday meeting. Should I go ahead?"
- Handle interruptions - Users will cut the AI off mid-sentence. Stop immediately and listen - don't finish the response first.
- Provide visual feedback - Even voice-first interfaces need visual cues: a pulsing orb while listening, a spinner while processing, text transcription of what was heard.
The Voice Interaction Loop
Wake / Trigger
User activates voice input via button press, wake word, or always-on listening.
Listening
Show visual feedback (pulsing animation, waveform). Capture audio.
Transcription
Show the transcribed text so users can verify what was heard.
Processing
Show "thinking" state. Keep it short - voice users are less patient than text users.
Response
Speak the response + show visual companion (text, card, image).
Follow-up
Offer next actions or stay in listening mode for follow-up.
Design each state explicitly. The transition between listening, processing, and responding should feel smooth, not jarring.
When Voice Needs a Visual Companion
Some information doesn't work in voice-only:
- Lists longer than 3 items ("Here are the 7 restaurants near you..." - no one remembers all 7)
- Anything with numbers, URLs, or code
- Comparisons ("Option A costs $45/month with 10GB, Option B costs...")
For these, the voice says a summary and the visual shows the detail: "I found 7 restaurants nearby. Here they are on your screen." This is the pattern Siri and Google Assistant use - voice for the headline, screen for the data.