Advanced PatternsLesson 8 of 11

Voice Interface Design Patterns

4 min readConversational UI for DesignersUpdated Apr 2, 2026

Voice interfaces break most of the assumptions text chat makes. Responses need to be two or three sentences, not paragraphs; users will interrupt mid-reply; and screen-based feedback shifts to audio cues plus a pulsing-orb affordance. This lesson covers the patterns unique to voice.

Voice-Specific Design Principles

Keep responses short - 2-3 sentences max. If complex, chunk it: "Here's a quick answer. Want me to go deeper?"
Confirm before acting - In voice, the AI hears what it hears. For destructive actions: "I'll delete the Monday meeting. Should I go ahead?"
Handle interruptions - Users will cut the AI off mid-sentence. Stop immediately and listen - don't finish the response first.
Provide visual feedback - Even voice-first interfaces need visual cues: a pulsing orb while listening, a spinner while processing, text transcription of what was heard.

The Voice Interaction Loop

Wake / Trigger

User activates voice input via button press, wake word, or always-on listening.

Listening

Show visual feedback (pulsing animation, waveform). Capture audio.

Transcription

Show the transcribed text so users can verify what was heard.

Processing

Show "thinking" state. Keep it short - voice users are less patient than text users.

Response

Speak the response + show visual companion (text, card, image).

Follow-up

Offer next actions or stay in listening mode for follow-up.

Design each state explicitly. The transition between listening, processing, and responding should feel smooth, not jarring.

When Voice Needs a Visual Companion

Some information doesn't work in voice-only:

Lists longer than 3 items ("Here are the 7 restaurants near you..." - no one remembers all 7)
Anything with numbers, URLs, or code
Comparisons ("Option A costs $45/month with 10GB, Option B costs...")

For these, the voice says a summary and the visual shows the detail: "I found 7 restaurants nearby. Here they are on your screen." This is the pattern Siri and Google Assistant use - voice for the headline, screen for the data.

← Back to Build a Conversational UI overview