The language and tone-of-voice of our conversational content has a profound impact on the relationship between a modern organization and its customers. This is as true today as it was in the past. Rhetoricians and performers like Plato, Cicero and many others have built their reputations on their ability to captivate an audience. However, where content guidelines for traditional (digital) media could be managed by a relatively simple style guide, conversational AI is much less predictable since it involves real-time user responses and unexpected turns. This 2-way dynamic requires a more holistic approach to content delivery.
As a conversation designer I can still learn a thing or two from this rich discourse of rhetoric, and in this article I’ll investigate what our ancestors thought were the main characteristics of great rhetorical performances.
First point of order is to understand that even though rhetoricians and other performers were speaking to, and not with an audience, they were constantly aware of the effects of their performance on their audience. The delivery of a performance was of the highest importance in the Greek and Roman times, because this meant an orator could influence his audience. The most famous example of the orator’s influence are Cicero’s Catalinarian orations. In a series of speeches delivered on multiple nights he successfully convinced the Roman Senate of Cataline’s plot, saving the Roman Republic from his treachery and systemic collapse.
Through the ages, the rhetorical discourse expanded into the delivery of vocal and instrumental music, stage acting and musicals, and today the advent of conversational AI allows for a new level of emotional connection between users (the audience) and a computer (the performer). Unfortunately the quality of chat and voice experiences isn’t always up to par. As conversation designers it is our job to prevent users from getting frustrated and to reach high satisfaction scores.
Quite a lot of research has been done on the emotional connection between humans and computers. We know that users interact differently with chatbots, voice assistants and other human-like virtual experiences than with a website or app. The technology of Natural Language Programming enables us to create conversational agents with smart, human-like responses that bring about feelings of familiarity and empathy among its users. This phenomenon is called pareidolia. As a result, organizations are starting to recognize that these experiences are ideally suited to convey both a logical answer and build emotional connections between their virtual assistant and their customers.
As conversation designers, we often have to make assumptions as to what a user might say to our assistant. In an ideal world the conversational copy is well-researched, and both happy and non-essential paths are written and implemented before the assistant goes live. But in reality, it doesn’t quite work like this due constraints in time and resources. The modern, lean organization requires us to push our assistant live as early as possible. Secondly, we know that users sometimes prefer to be prompted with buttons rather than an empty line to write on. For these reasons, conversation designers often end up designing an interaction between two people rather than only thinking up our virtual assistant’s lines, and improving it later based on a review of the conversational logs. This approach helps us predict and strengthen the effects of pareidolia with our users, resulting in better experiences.
The above considerations teach us that a holistic approach of content and delivery is essential for a great conversational experience, and lucky for us, the classical rhetoricians were experts in the field. Each period and author has a unique view on what drives a good performance, but in general they agree that 4 principal elements make a good rhetoric performance: Audience and affect, delivery, structure, and ornamentation and repetition. I’ll summarize these below now, and hope to explore each of these in more detail in future blog posts.
Audience and Affect
This element emphasizes awareness of the audience. The performer needs to understand their demographic, language, status, and their number. The goal of this rhetorical element is delectare — to charm and amuse. The language, time and place, style and décor should add to the audiences’ delight.
Affect can be summed up in the French saying “C’est le ton qui fait la musique”, meaning that our tone of voice has a profound impact on our delivery. The tessitura is the relative position of the voice within its natural range, and we can embellish it with intentional intervals and changes in the speed of our delivery. In modern days we see designers of text-to-speech content for voice assistants use different base voices and SSML to create perfect affect.
Allegory and symbolism create strong images in our audience’s minds. In his fourth Catilinarian oration Cicero paints a heroic picture of himself, when he says “…the Gods have determined that he should snatch the senators and the people from miserable slaughter, their wives and children and the vestal virgins from most bitter distress, the temples and shrines of the gods, and that most lovely country of all of them, from impious flames, all Italy from war and devastation.” (reference link, from p.57).
We don’t know if the Roman Senators actually believed this to be true, but the allegory of pain and devastation definitely made it easier for his audience to grasp what the loss of the values and structure of the Republic could mean to them. We don’t need to be overly dramatic — allegories of a more benign nature already have a strong effect on the engagement level of our chatbot users.
The delivery, or pronunciatio describes the linguistics and style of written and spoken language. Most rhetoricians considered this the most important skill in a performer’s toolkit. It involves (among others) the performer’s variety, dynamics, humor, physical appearance and articulation. In Conversational AI (CAI) we know that variety in answers adds to the human-likeness of the agent and higher user satisfaction rates. We want a multi-turn conversation with our agent to hit the right tone at every stage. This is where dynamics come in. Humor helps to lighten the mood and when jokes are crafted to fit the brand and bot persona it builds great engagement.
Physical appearance can be translated to the chat UI: is it a widget, is it right in the middle as a part your webpage, or is it an in-app experience? Is it styled in the brand colors?
For our voice assistant it’s probably even more important to understand how looks impact our delivery. Does the assistant live in a home pod with screen, or without? Is it cool and effective or warm and cute? Can it move? Does it have an avatar in 2D, or maybe even in 3D for the metaverse?
Historically, a good artistic structure consists of three main elements: exordium, narratio, and conclusio. In we typically refer to these elements as the introduction, the statement of facts, and the closing statement. The narratio (statement of facts) is all about instructing the audience of important truths.
The exordium and conclusio are known to make or break a performance. The messages and emotional charge need to be performed with great care so to reach the desired effect. For me the description of these three parts is how conversation designers should approach bots in the FAQ phase. In these conversations we should engage users right from the start, then convey the logical answer to their question, and finally use the closing ‘statement’ to give extra meaning to the conversation and stir brand engagement.
Quintilian (p153) adds three more elements to the mix which are very interesting from a conversation design perspective. First, he mentions divisio, which describes story points to come to the audience. In chat and voice, it’s very important to set expectations before engaging in a long conversation or process with the user. For example, we could start a process by saying “I can help you replace your credit card. I’ll need to ask you 4 questions, and this will take about one minute. Shall we start?” By laying out the next steps in the interaction we foster our user’s understanding and patience.
Confirmatio is another element that Quintilian mentions. This part delivers evidence that helps confirm the audience’s propositions. Providing proof can be very important to maintain trust between an AI and a human. An example could be when a ‘stop smoking’ bot claims that non-smokers live longer. A link to a WHO or NHS statement could help the user’s trust and determination.
Excited, that’s what I’m feeling after going through all these. A virtual assistant is so much more than a tool for automation. The European Renaissance saw a revival of the Greek and Roman classics, recognizing their wisdom. I have seen enough evidence to believe that classical rhetoric will help us design better user experiences and stronger brands. Is it too bold to say that we’re at the starting point of the ‘renaissance of the classics in Conversational AI’? In any case, I will continue to explore how to combine this discourse with the modern skills of conversation design and UX design. Maybe one day, my digital assistants will be as convincing as Cicero in his Catalinarian orations.
Note: Many of the insights in this article derive from ‘The Weapons of Rhetoric’ — Judy Tarling (2004). It’s recommended reading for anyone interested in this topic.
How the Western rhetorical tradition strengthens UX of today’s chatbots and voice assistants was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.