Multimodal Conversation Design Tutorial (Part 2): Best Practices, Use Cases and Future Outlook

Welcome to part 2 of our tutorial on multimodal conversation design. In part 1 we learned about the basics of multimodal design and its related inputs and outputs. Today we’ll dive into contextualized best practices, review a common use case and discuss what the future might hold for multimodal conversation design.

Contextual Design: Building Relevant and Customized Experiences

Context in multimodal conversation design is essential. We can’t just think in chat, or just think in voice, or visuals alone. We have to think about how they complement each other and which one best serves the user in any given moment. Where is the user? What are they trying to accomplish? These should be the main considerations when working with multimodal design.

Knowing where users are while they progress through different steps of their journey can reveal both pain points and opportunities in design. This is especially true if the user journey requires switching between devices. Careful review of the user journey helps with understanding the advantages of one modality over another at various points. This type of review should take into consideration the entire user experience from beginning to end and map how those interactions come to life using a combination of modalities.

Cultivating a sense of safety and security for users in their interactions is also crucial for driving engagement that leads to customer loyalty. A multimodal experience can help achieve this. Speaking aloud and receiving an audio response make voice-first interactions inherently more public. If a use case involves the need for users to share private or sensitive information, combining voice-first with displaying text and visual inputs might be more effective.

Contextual Design: The Process In Detail

It’s vital to recognize that this is not a case of designing chat or voice-first and then simply layering on graphical elements. Best practice requires us to understand and prioritize what a user is trying to do and support that goal instead of pushing or even forcing them to engage with a product or preferred interface. Often this means playing a supportive role. It requires critical reflection and honesty about whether a brand is truly committed to creating the most frictionless interaction for a user or are justifications being made for inadvertently furthering frustrations. Which leads us to…

Multimodal Is More Than Flash

Multimodal conversation design is intended to combine multiple inputs and outputs to improve a user’s experience. Designers make life easier for users by incorporating and automating actions through different modalities. If there was only one modality mechanism, it would negatively affect the user experience and the design would “fail” in the mind of the user.

That said, when everything in a design competes for your attention, nothing wins. Too many elements in a user’s journey can actually push the feel of the UX into the gimmicky territory. In multimodal conversation design, where the audible, visible, and tactile compete for attention, it’s less straightforward. Each modality has its advantages. The key lies in emphasizing one at a time.

Each element must be intentional. It should have a purpose, not just flash. It’s not just about what’s available, it’s about whether what’s available is even relevant or appropriate. If users are distracted by an element, they’re not concentrating on the intended user journey. Intentionally designed experiences work, while others can immediately come off as overwhelming. There’s an art to bringing together visuals and audio to create seamless communications.

Well-thought-out multimodal conversation design also prioritizes accessibility. The power of a multimodal experience should not be underestimated. It can reduce difficulties, improve independence, and include more people in the conversation. This is why it’s critical to ask the right questions at the beginning of the design process to determine if any users are being excluded from the product. Can all users of a multimodal journey complete a task or get from A to B without major roadblocks? Answering these questions will help demonstrate that a brand cares about the needs of all of its users.

User verbally asks their smart display for a recipe.
Smart display verbalizes that it can help and provides visual recipe results.
User manually scrolls through options on the smart display, taps for more details, and reads the recipe.
User verbally updates the voice assistant’s shopping list for recipe ingredients.
Smart display verbally and visually confirms the shopping list update.
While driving to the store, the user remembers something else they need, uses an in-car smart assistant to verbally make an addition to their shopping list.
At the grocery store, a user doesn’t want to disturb others, so they use the mobile app’s GUI to read their shopping list, tapping to check-off items as they go.
Back at home, the smart display remembers the selected recipe and both verbally and visually explains the recipe steps.
With their hands busy and messy, the user can verbally ask the smart display to repeat a step, set a timer, play music, etc

Multimodal Conversation Design: A Not So Simple Use Case

Let’s walk through a simple example of how multimodal design can improve a user’s experience for something like making a recipe. While it may seem simple on the surface, it involves multiple visual, audio, and touch interactions, sometimes simultaneously.

Think of the roadblocks this same user would experience through a rigidly audio or visual-only interface. Where might frustration arise without the ability to obtain or convey information within the most context-informed modality?

How Multimodal Design Will Impact the Future of Customer Interactions

AI’s predictive abilities will continue to evolve through a variety of methods, from physical recognition cues to automatically checking schedules and drawing conclusions. Vocal and physical biomarkers will also provide additional context for the best modality. We’re already seeing this in how smartwatches use biometric information for communication.

In Q1 2021, smart displays comprised 38% of worldwide smart speaker sales. Although screenless smart speakers lead the market for voice-enabled devices, the popularity of smart displays is quickly climbing. Amazon is tapping into this opportunity by building a large Echo Show to mount on the wall. Verizon recently announced it’s entering the market with its own smart display. As these devices make headway, companies must determine how these displays integrate with both their core services and broader smart home plans to provide users the best multimodal experience.

As discussed, multimodal design is inherently human-centric. Many are convinced that the next step to bring a more human-like experience to chat and voice is the inclusion of a virtual human experience. With a combination of three-dimensional bodies, expressive faces, and natural language understanding, virtual humans stand to progress multimodal design and continue shrinking the gap between human and technological interactions.

Multimodal Design: Excellent Branding Opportunities

As experiences become more interactive and multimodal they will become more shareable in the design and product world. This means big opportunities for brands’ conversational experiences to get the customer and industry attention they deserve as they become more consumable for broader audiences. In turn, customers will come to expect multimodal interactions wherever they experience conversation design.

With this in mind, a design team’s skill sets will need to expand as it’s no longer solely about designing for one modality; it’s about designing for multimodal functionality and experience. This means determining how the visual interface interacts with the context provided by text and speech interfaces, as well as how these different forms of interaction harmonize with one another.

Ultimately, multimodal design’s purpose is to contextualize and offer up options to provide users with the best interaction for the moment they’re in. Having easily navigated an experience, customers should walk away satisfied with both the specific interaction and the overall brand. And when it comes down to it, hasn’t that always been the objective?