Natural Interaction
Multimodal Interaction
Combine voice, touch, gesture, text, and visual input for natural interaction.
What is Multimodal Interaction?
Multimodal Interaction lets users communicate through voice, touch, gestures, text, and visual input, switching seamlessly by context. Instead of one input method, the system adapts to how users naturally interact. It's essential for accessibility, mobile devices, or environments where certain inputs aren't practical. Examples include Google Assistant combining voice and touch, iPad Pro blending Pencil and voice, or Tesla mixing voice, touch, and automatic responses.
Example: Google Assistant Multimodal Queries

Combines voice commands with visual elements (e.g., 'show me photos of my trip to Paris') displaying relevant images and allowing touch to refine results.
Figma Make Prompt
Want to learn more about this pattern?
Explore the full pattern with real-world examples, implementation guidelines, and code samples.
View Full Pattern