The future of voice is for everyone

Tim Smith, Design Director & Managing Partner

VOICE Global, a self-proclaimed 24-hour livestream at the forefront of conversational development, design, and artificial intelligence platforms, was a virtual event held this month with ambitious plans to live stream content on the topic of voice technology for an entire day… and it almost achieved those lofty heights, were it not for some technical issues.

Hi Mum! Said Dad is a leading digital product and innovation agency pioneering in the emerging technology that is voice interaction - and I can say that without any sense of bragging because I have only just joined HMSD this month, so I’ve yet to have anything to do with their our impressive success in voice. For this reason, we were more than happy to join in on the virtual technology festival from our homes, gardens and sheds. What we observed was that there is an exciting and strong human voice in Voice.

Voice is what makes us human...

...And I don’t just mean the voice box’s ability to verbalise our opinions on the weather - having a voice is a fundamental human right and a tool that gives us the power to share an opinion and to action change. Each human voice is as important as the next. In these days of the Me Too movement, Black Lives Matter movement and consideration of those with disabilities in this COVID-19 lockdown life, voice is on everyone's minds and everyone’s tongues.

So it seems like now is a fitting time to be talking about voice, voice interaction and voice technology, with the VOICE Global event bringing together a diverse range of experts and innovators in this space. It was warming to see and hear how important voice is to everyone speaking at the event and the ambition to drive its importance as a part of who we are and what an incredible tool the technology can be in hearing the voices of a diverse range of humans - people, some for the first time.

Human-centred voice design

Human-centred voice design should embody four main human traits; Empathy, Emotion, Connection and Relatability, according to Maaike Coppens at Promptful. These are all human traits we have developed to communicate with one another, and so your Alexa or Google Assistant should have these same four traits for a positive interaction. The characteristics can come about in both the human user and the AI tech through intonation, which can completely change the meaning of a string of words and can be used to ensure a smooth user experience.

Juergen Schmerder, Director of Experience AI at Mercedes-Benz R&D, took this one step further and introduced the idea of personality to voice AI, this is particularly interesting when it comes to branding. Mercedes-Benz are developing their own voice UI, known as ‘Hey Mercedes’, as part of their innovative new(ish) MBUX in-car digital platform. What personality would a Mercedes AI have? How might it become friendly and familiar with a Mercedes driver? 


These are some of the exciting challenges that Juergen and his team are working on. One might wonder how Mercedes-Benz can beat the likes of Alexa and Google Assistant - but my thought is that voice is an opportunity for the car OEMs to take back some lunch from the tech companies. Voice interaction in the car is one of the most meaningful applications of the tech - given your hands and eyes should be busy (driving). Consumer adoption is based on trust and so if the user's first experience with voice is good, and it is in the car where it is most meaningful, the user will trust that AI over others. People may want to say "Hey Mercedes" on the plane, in the kitchen or in the hotel for example. Given Mercedes-Benz will have the attention of the user in the car on the way to the airport, and on the plane with their partnership with Lufthansa, and then again at the Mercedes-Benz hotel at the destination, you can imagine a seamless and personal Hey Mercedes voice experience through the entire journey. 


Accessibility and equity through voice

There are many technologies out there that, often unknowingly, discriminate against a particular group of people. Whether it be inaccessible by wheelchair train platforms, or IBM’s recently cancelled face recognition technology due to racial bias, a lot of technology has inherent discrimination written into its very code. 

Voice, however, is a technology that has the potential to offer a medium for change and for equity (equity is equality when consideration of all groups is brought to an even standard) and according to Carissa Merrill, Senior Experience Architect at U.S. Bank; “Robust speech technology is a great equalizer, and democratizes information”. The key word in this quote is robust - as The New York times spoke of earlier this year, there is already some bias in Amazon, Apple, Google, IBM and Microsoft's speech recognition.

Voice technology harnesses machine learning and natural language processing to learn from it’s own experiences and interactions with humans, people - us, all of us, every single one of us. The technology doesn’t care if you are male or female, a child or an adult, a paraplegic or an African American, everyone with a voice can have a voice, equal to everyone else’s, thanks to voice technology. 

Ray, our voice expert's point of view


Voice assistants as a platform is in its early stages, but it's hard to ignore where it excels. We've spoken to users and we've heard a common theme echoed; "It's hands-free, and it just does what I tell it to".  The satisfaction of having a voice assistant respond to your command is a powerful thing, and freeing up your hands to cook in the kitchen or focus on driving, is a huge deal.

But voice assistants face a few challenges. Most people use it for utility and built in functionality, whilst 3rd party voice skills face challenges around discoverability. So in a landscape where so many voice skills exist, how do we get them in front of users so it can actually get to work helping?

Our BBC Good Food skill started as a 3rd party skill, where we aimed to tackle a real world problem: how can we help users cook hands-free, so they don't have to dart back and forth from cutting board to mobile phone. But having a great solution, and making it aware for your everyday Joe is another thing. The common challenge all 3rd party skills face is a user needs to know the words: 'Alexa, open skill name', to start the experience.

Working with Amazon, we know that they look for best in class experiences. So, when Amazon picked up on the value of our skill, it was added to their platform as a 1st party experience. This allowed more natural phrases, eg: 'Alexa, find me a chicken recipe' to link to our skill. Not only did this do wonders for discoverability, it also meant our voice experience naturally gets to work faster on helping with everyday life.

As Tim mentioned, Empathy, Emotion, Connection and Relatability are key traits to a good experience. So whilst you might have something that's genuinely useful for users, there's a few flourishes that can be made to create a unique and more human experience. Amazon's Neural Text to Speech is something we've used to add a softer, more natural sounding voice, for skills like The Gruffalo, where that bedside storytelling is key to having younger kids immersed.

In a nutshell, I believe there's great value in voice as a unique platform, and the future is growing and promising. There's a few hurdles in the way, but any cool new tech platform is usually a marathon of learnings and improvements, not just a sprint. The sooner voice experiences can keep to a gold standard and be easily discovered by people like you and me, the more intuitive and smarter the solutions can be, to make everyday life that much more enjoyable.


If you're thinking about how a voice experience could work for your brand, please get in touch. We'd love to hear from you.