“Why should I trust you Alexa?”

Exploring trust and sociability in conversational agents.

In this short essay, I want to highlight that how we design conversational agents such as Alexa and Siri and what characteristics we give them has a strong impact on trust and reliability. I also share four ideas for how a conversational agent could be designed with trust as a priority, something that is becoming increasingly important as this technology becomes more widespread and powerful. Consistency, clear capabilities, rethinking personality and offering explanations would solve many of the trust issues that arise in conversational agents today.

While writing my master’s thesis I collected research and interviewed families about how they use these agents in their daily lives wanting to investigate the area and identify problems and concerns[1]. Hearing the concerns that families had regarding these agents and researching trust and responsible AI made it very clear to me that the current design practice does not prioritize responsibility or trust.

The first point I want to make is that while humor and sociability may be important to a fun and pleasant interaction it also affects how humans perceive the intellectual capability of the device they are interacting with. And that a difference in expectations and the actual capability of these devices lead to disappointment and trust issues long term.

Secondly, I wonder what characteristics conversational agents will have in the future and what the goal and purpose of these characteristics should be. If the goal is to design a responsible agent that we trust, how would we design it and what characteristics would we give it?

What is a conversational agent?

A conversational agent is a type of dialogue system built on natural language processing, the examples we encounter in our daily life are often voice assistants such as Alexa or Siri and chatbots acting as a sort of filter for technical support on websites. While writing my master’s thesis I became really interested in how we interact with these conversational agents in our daily lives and how we relate to them. It’s also important to remind ourselves that conversational agents are a kind of AI tool that has been designed by humans, it does not contain any kind of magic and should be held to the same standard as any other product we find in our home.

One of the first conversational agents to pass the Turing test in an international competition at the royal society in England was Eugene Goostman, a chatbot with the characteristics of a 13-year-old Ukrainian boy with poor English skills created by a group of programmers from Russia and Ukraine[1]. These characteristics had been very carefully selected by the team of programmers in order to induce a sense of forgiveness in the person interacting with the agent. This a common practice when designing robots and agents, we’re more likely to see past mistakes if the agent is childlike and users often find it entertaining or cute. The personality, humor, and sociability of conversational agents such as Alexa have been designed and carefully selected by the engineers and designers working at these companies. These characteristics improve usability in the sense that the agent becomes more pleasant to talk to and it’s safe to assume that the companies have tried to optimize the personality of the agent with whatever metrics they measure usability with.

Tinny Tim from Futurama is a small, disabled childlike robot with a British accent that you feel sorry for throughout the series.

What does communication with a conversational agent look like?

A majority of the communication between a voice assistant type agent such as Alexa and the user is one way, as commands asking the device to perform simple functions as playing music, setting a timer, or simply telling it to stop doing whatever it is doing. But a large part of the communication is a conversation, one might ask what the largest mammal in the world is, and the agent will pull information from the web and communicate it to the user.

Data from an article by Sciuto et al (2018) [2]

Jokes make up about 1–5% of the interactions with the agent being the most common in the beginning when the agent is new and exciting and the user is exploring the capabilities of the agent[2][3]. Jokes and other fun interactions quickly become predictable and boring.

It’s in this communication that the agents’ personality and characteristics come into play the most, how the agent communicates the information, how much of it, and with what tone varies widely between different devices and users.

Trending Bot Articles:

1. Case Study: Building Appointment Booking Chatbot

2. IBM Watson Assistant provides better intent classification than other commercial products according to published study

3. Testing Conversational AI

4. How intelligent and automated conversational systems are driving B2C revenue and growth.

Two anecdotes from my thesis

As a part of my master thesis, I wanted to observe how children who can’t read or write use conversational agents to search for information and learn about topics that they liked. To demonstrate how this can be done I asked about the Eiffel tower and this is how the interaction went:

Me: Okay google, how tall is the Eiffel tower?
Google: 300 meters
Child: Okay google, how tall is the Eiffel tower?
Google: According to wikipedia the Eiffel tower is a 324 meter tall iron tower located in Paris…
Child: You didn’t say that the last time!!!

This is an example of how the communication can be inconsistent, and the child in the study was quite upset that the agent had lied to him. I wonder what design choices led to this interaction. I can imagine that the reasoning is that if the user asks twice they are probably more interested in the subject and want more information, and it would probably make sense in a mobile app context. But in a conversational context, this leaves the user wondering if the agent has withheld information in the past and if it is a reliable source of information in the future.

The second anecdote is a story one the parents I interviewed told me.

Parent:I did a little test as an adult and talked with a friend about Google, about the depression many have today. We have Siri, we have hey Google, then you should be able to use them as a therapist or a psychologist instead of going to one. I did not go well. It went terribly, completely. Then you really see the limitations of the device.

This is a common experience for new users. The capability to joke, carry out some conversations, and act in a manner that seems complex and sociable gives users the impression that the agent is capable of much more than it actually is. The user then explores the capabilities of the agent, asking tough questions and even philosophical questions. They then very quickly realize that the agent is not as advanced as they first thought and that it is mainly an extension of the services that the company provides.

This harms trust because the expectations of the agent’s capabilities far outweigh the actual capability of the agent. And it often leaves the user quite disappointed in the interaction, feeling confused about what the capability of the agent actually is and why they thought it was smart in the first place. It also leads to other weird conversations where the user asks questions that are way beyond what the agent is capable of answering. A playful example of this is the following conversation a child had with the agent while learning about planets and dinosaurs.

Child: Do you know why planets must exist?
Google assistant: I can sadly not answer that question.
Child: Why are there people? Or dinosaurs?
Google assistant: Sorry, I can’t answer that question.

This kind of conversation with an agent happen more often than you would think[4]. In most cases, there is no harm done in these conversations, but other studies have shown that people ask sensitive questions regarding their mental health, and asking for help when they have to make tough decisions[5]. In these cases, the response that the agent gives is more important, and designers should be critical towards how agents respond in these situations.

Imagining a trustworthy conversational agent.

The design goal of many of the conversational agents today is something akin to usability and the metrics used to evaluate these products are: number of interactions, purchases, and surveys are used to evaluate these products. There’s certainly also machine learning involved, using the vast amount of data collected during conversations and trained using the metrics mentioned before in mind. A problem with this is that machine learning that relies on human feedback has the risk of tricking humans into believing it’s doing a good job when in reality it found a shortcut[6]. My point with this is that it’s important to sit down and evaluate if the agent is actually improving in terms of usability and trust, and not only appearing to be.

In the paper a robot trained with human feedback has developed a way to trick the human evaluator into believing that it is grasping the ball when it in fact only is grasping air in front of the camera.[6]

How would a conversational agent be designed with a different design goal in mind? How would we design a conversational agent that is built to optimize for trust instead of usability? I have a few suggestions.

1.Communicate capabilities clearly and don’t oversell it

Voice assistants such as Alexa, Siri, and Google Home is an interface for their respective ecosystem and services. This should be clear when the user adopts one of these tools, but too often it is marketed as a personal assistant that is there to help with whatever you need, an all-in-one assistant powered by Artificial Intelligence and invented by engineers in silicon valley.

An example of ambiguous marketing by Google

It’s better to communicate what the voice assistant is actually used for, and what it is capable of doing such as playing music, accessing smart-home functions, and helping you cook. A trustworthy voice assistant would be marketed as a small computer with a voice interface that is excellent at pulling information from the web and help keep track of small tasks. When asked what the meaning of life is it should either make it very clear that it is performing a google search with the keywords “what is the meaning of life” or answer that it does not have the capability to answer the question.

2. Consistent communication

In everyday conversations, we appreciate when people mix it up. I personally struggle with coming up with unique phrases to end emails with so that it doesn’t look like a bot has written it. In the same sense, conversational agents try to vary their language in order to appear more human. The trustworthy voice assistant would have to distinguish between communication that is factual and communication that is sociable. In the case of asking how tall the Eiffel tower is the agent should always give the same answer, but in other cases such as what type of greeting it should say the communication can be more varied, for example:

Google assistant: I can sadly not answer that question.
Google assistant: Sorry, I can’t answer that question.

3. Re-examine the purpose of personality, language, and tone

Siri, Alexa, Cortana, and the chatbots you encounter online have been very carefully designed. Their every aspect can be tweaked and manipulated however the designers want it to, the voice could be that of Arnold Schwarzenegger or encourage you to speak in another language but the current voice and personality of Siri are what Apple has decided to use. In the example of Eugene Goostman, the agent pretended to be a young child to easier pass as being human in a Turing test but in the case of a voice assistant, we know that it is a small computer. A conversational agent appearing as a human is impressive and is often seen as a sign of quality, but giving it too many human qualities can make it fall into the uncanny valley [7].

There are also many interesting design prototypes on this subject, for example, a genderless voice for conversational agents. Gender is something that we assign the device anyway.

Our imaginary trustworthy voice assistant would have a personality and voice that is aligned with its capabilities and purpose, it should be clear from the way it speaks that this a robot that can help you select music hands-free. Perhaps it should follow the same design tradition as GPS devices, prioritizing clarity and being heard over the noise of other cars. But I would love to hear a genderless and clearly robotic voice with human-like behavior in action.

A project at the Umeå Institute of Design explored the concept of a voice assistant with a completely different language that the user has to learn in order to talk with the assistant. This forces the user to re-evaluate their relationship to the device and also provides a sense of privacy for the user, knowing that it can’t understand the language they are speaking in its presence[8].

4. Take responsibility for what it says and offer an explanation

Finally, the conversational agent should be designed with accountability and responsibility in mind. The agent interprets voice input and translates it into actionable commands that the computer can execute. Our trustworthy AI should make this process transparent to the user, making it clear what it heard and what function it executes. During or after a voice command the user should be able to ask how and why it responded in the manner that it did.

An example:

User: Okay agent, how tall is the Eiffel tower?
Agent: According to wikipedia the Eiffel tower is a 324 meter tall iron tower located in Paris…
User: Okay agent, explain yourself.
Agent: What i heard was you asking for the height of the eiffel tower, I navigated to the Eiffel towers wikipedia page and pulled the information regarding the height of the structure. I processed this information and generated a shorter version and communicated to you.

This information can be overwhelming, and the person who created this function can probably break it down into many smaller steps. What is important is that that the user can ask for an explanation, and see that this is a piece of software with its own flaws and strengths and not the result of an intelligent being simply explaining something it remembers. If the answer is wrong it’s the fault of the engineers and designers and not the agent.

Hedonismbot from Futurama


Designing conversational agents with responsibility, transparency, and trust in mind is growing in popularity and people are starting to realize that it needs to happen for AI to be developed sustainably and ethically. I hope this article is read by someone working with conversational agents and thinks of these aspects when they design the chatbots and voice assistants of the future.

And I have much more to write, there are many questions I have only touched upon that I want to explain more thoroughly. Such as: What is trust in a conversational agent? What data does a voice assistant collect and should you be worried about it? What are the dangers of powerful natural language processing such as GPT-3 https://en.wikipedia.org/wiki/GPT-3 and what are the implications for consumers and scientists?


[1]Horned, A. (2020). Department of informatics Conversational agents in a family context A qualitative study with children and parents investigating their interactions and worries regarding conversational agents. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172348
[3]Sciuto, A., Saini, A., Forlizzi, J., & Hong, J. I. (2018). “Hey Alexa, what’s up?”Studies of in-home conversational agent usage. DIS 2018 — Proceedings of the 2018 Designing Interactive Systems Conference, 857–868. https://doi.org/10.1145/3196709.3196772
[4]Ammari, T., Kaye, J., Tsai, J. Y., & Bentley, F. (2019). Music, Search, and IoT: How people (really) use voice assistants. ACM Transactions on Computer-Human Interaction, 26(3), 1–28. https://doi.org/10.1145/3311956
[5]Druga, S., Breazeal, C., Williams, R., & Resnick, M. (2017). “Hey Google is it ok if I eat you?” Initial explorations in child-agent interaction. IDC 2017 — Proceedings of the 2017 ACM Conference on Interaction Design and Children, 595–600. https://doi.org/10.1145/3078072.3084330
[6]Luger, E., & Sellen, A. (2016). “Like having a really bad pa”: The gulf between user expectation and experience of conversational agents. Conference on Human Factors in Computing Systems — Proceedings, 5286–5297. https://doi.org/10.1145/2858036.2858288
[8]Troshani, I., Rao Hill, S., Sherman, C., & Arthur, D. (2020). Do We Trust in AI? Role of Anthropomorphism and Intelligence. Journal of Computer Information Systems, 1–11. https://doi.org/10.1080/08874417.2020.1788473

Don’t forget to give us your 👏 !

“Why should I trust you Alexa?” was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.