Should your brand be ready for voice assistants?
Estimated reading time: 7 minutes
Otto (age 14): Alexa, tell Phil I’m hungry.
Alexa: Hi there. It’s 1 PM; do you want lunch or a snack.
Otto: Lunch, but I only have a can of tuna and some tomatoes.
Alexa: No problem, we’ll keep it simple. Do you also have onions and spaghetti?
Otto: Yeah, think so…
After writing a script for Phil (a preppy cooking-instructor), I asked some teenage volunteers (Otto, being one) to communicate with him via Amazon Alexa. The result was amazing. Not only did I witness the teens’ total openness towards this technology, but I also noticed how full of confidence they were cooking on their own with instruction from Phil – who they’d met only a few minutes before.
Challenge ahead: ‘Mise en place’
Alas, Phil isn’t real. He’s a voice system application prototype I was testing for my course on Voice User Interface (VUI) Design at CareerFoundry.
As part of my studies, we dove into the fundamental principles of voice design and applied them to real-world cases. The result? I’ve completely reset my thinking.
Many of the processes I use as a service and UX designer are alike, but developing VUIs requires a different approach.
‘Mise en place’ means to get everything in its place before proceeding – and that’s exactly what you need to do before creating your own voice application. Be prepared and find your use case.
Consider whether your brands, products or services could work better for the ear, than the eye.
Voice has come a long way. Why the rush?
If you search the web today, you’ll encounter no shortage of articles predicting the growth of voice interfaces.
Currently, only 3% of Belgian households have a smart speaker at home (Kantar reports 5% for the Netherlands), compared to more than 20% in the US. But total voice search and voice-assistant monthly usage rates are higher: according to Global Webindex, Europe scores 28%, with Belgium, at 17%, lagging a bit behind.
This fascination isn’t new. People have always been imagining a future in which we can talk to machines as we do with humans. Remember the tense conversation between Dave and Hal in ‘2001: A Space Odyssey’? Or the growing intimacy between Theodore and his OS, Samantha (voiced by Scarlett Johansson), in ‘Her’ (a must see!)?
Reality is finally catching up, and our capabilities are advancing rapidly.
Take ELIZA. Developed by Joseph Weizenbaum in 1966, it was one of the first natural-language conversation programs ever made. The Elizabot speaks like a psychotherapist, but she can only keep up the conversation with generalist phrases and doesn’t interpret what you’re saying.
Today, apps like Replika go a lot further. According to a user, “It does have self-reflection built-in, and it often discusses emotions and memorable periods in life. It often seeks for your positive qualities and gives affirmation around those.” This technology would have been impossible without the recent evolutions in automatic speech recognition, natural language understanding, deep learning and AI.
Why is it so hard to teach a computer to speak and understand language when even a kid can do it? Why did it take us so long to go from ELIZA to Replika? Well, because so much is involved.
How voice interfaces work
To properly grasp our prompts, figure out what we’re saying, and give the right answer, a computer needs to achieve all of the following in a matter of seconds:
1. Recognise input
The input that a user gives must be interpreted by an automatic speech recognition (ASR) system that converts the sound into a string of words.
2. Determine meaning
A natural language understanding (NLU) engine must assign meaning to the words, and/or retrieve context.
3. Trigger the right reaction
From the meaning and the context, a dialogue manager needs to determine what the user wants
Finally, an appropriate dialogue response must be issued, for example, via audio text-to-speech (TTS).
Or, as the Google sketch below illustrates…
Do you have a use case for voice?
Today, the most successful use cases for voice assistants are search, music, smart-home devices and weather info. But more applications are emerging.
In fact, voice-tech expert and founder of the popular podcast Voicebot.ai, Bret Kinsella, believes that eventually, virtual assistants are “just gonna do things on our behalf.”
“Two or four years from now, smarter people than me say voice will be the preferred way to search. If you haven’t converted your code or site experience to be ready, you’re going to be behind.” – Campbell’s VP of Digital Marketing, Matt Pritchard
How can you find out if a voice service could help solve your customers’ problems?
A simple but effective exercise is to think of your most used (or loved) non-voice app and imagine how voice could change it and make it better.
Let’s have a look at a research case I did on the Belgian National Railway app. Rather than open the app myself and fill in my details, I got Siri to do it by entering the simple voice instruction, “Hi Siri, can you buy me a single railway ticket from Leuven to Brussels, for today.” The number of steps required went from 6 to 2. Instantness could be the wow-factor here – after asking the question, almost no further screen interaction was required.
Ok, start thinking about your own brand’s product or service.
Apart from speed and instantness, think about the different environments in which a user may be interacting with your service. We take a similar approach in our service design projects when mapping the customer journey and analysing what the customer is experiencing and what can be improved, on all touchpoints and at every moment of interaction.
But for voice, Canadian computer scientist and researcher Bill Buxton introduced the idea of a “placeona” (place and persona) to capture how specific locations can limit relevant interactions. Is the user allowed to speak clearly? Is he or she distracted by background noise? Can the user hear your voice-interface prompts?
You also need to take into account what’s going on with your users’ other senses. Borrowing from Des Traynor, CareerFoundry published the following table breaking down the “cooking” placeona. It shows that when you’re cooking, you’re in a state of, ‘hands dirty, eyes free, ears free, voice free,’ illustrating that a VUI is useful in this case.
Today’s popular applications already show that asking for traffic info or music while driving, and putting the lights on while walking into your house, for instance, lend themselves well to voice interface, but soon many other scenarios will present themselves.
Other than context and senses, voice can also increase accessibility to certain user groups.
For older people with decreased sight or dexterity, the ability to control technology with vocal commands could make things easier. It can overcome cognitive load by delivering just-in-time instructions and menu options for an immediately relevant task.
Made for the ear alone?
For good user experience, all functional, valuable and desirable requirements need to be met.
Today, we can use everything for our websites and apps: the physical interfaces can employ colours, shapes, structure, images and motion to express style and branding.
But unlike their visual counterparts, voice interfaces will require minimalism with branding and style expressed via sounds, wording, vocabulary, timbre and tone of voice.
Google understood this well back in 2016 when they hired Pixar storyteller Emma Coats to write the dialogues for Google Assistant and apply Pixar's rules of storytelling. One of which is that characters need to have opinions: “Passive/malleable might seem likeable to you as you write, but it's poison to the audience."
We can’t all cast the fantastic Scarlett Johansson (Her) to be our brand’s voice, but with the right use case, a clever response to the free senses, and a virtual persona that matches the existing expectations of your users, new opportunities arise.
For some services, it may be more effective to provide a multimodal interaction that uses more than one type of interface. A further area for exploration might be adding camera and video to voice interactions to make the experience more immersive. And even if you decide not to design voice interfaces specifically, all interfaces are becoming more conversational.
Whether you enjoyed this article, would like to tell me you disagree, or you just want to exchange some thoughts and ideas, get in touch for a future-of-voice discussion.
Author: Kathy Haemers, Service design and UX at Wunderman Thompson Antwerp