Voice Trends at the Start of the Decade

Shaun Withers

Shaun Withers

Voice Trends at the Start of the Decade: Heard from Las Vegas to Chattanooga

project voice google

Shaun Withers and Milkana Brace at CES 2020

One of my favorite aspects about my job is the opportunity to hear from a wide array of partners, brands, and digital agencies leading the way in voice technology. A few times a year, many of these leaders and innovators get together in the same place to discuss their ideas openly. Jargon had the pleasure of attending two conferences in January where this happened - CES in Las Vegas, NV and Project Voice in Chattanooga, TN. Here are the key trends in voice that I'm tracking closely as we enter the new decade:

1. Rapid proliferation of voice-enabled devices continues

The IoT dream of the past decade is taking shape and building momentum. Of all the voice industry metrics, the growth in the number and variety of voice-enabled devices is one of the most boast-worthy. Amazon revealed there are now hundreds of millions of Alexa enabled devices. Google had a similar announcement at 10 times that figure, but it's metric is less telling because it includes Android mobile devices, where Google Assistant comes pre-installed. Judging from the exhibition floors in Las Vegas, Alexa and Google Assistant appear to be head-to-head in their dominance of smart home and wearable device integrations.

Google showed off their voice integrations through a clever live demonstration at CES. The premise was a typical evening where dinner plans changed and we, the participants, were able to use Google Assistant to quickly find a solution for the evening. The variety of devices and use-cases in the home, car, and on the go (the grocery store in this demo), was a clear theme.

2. Brands are bringing more complex voice use-cases to the table

We've seen impressive user engagement metrics from Alexa and Google Assistant, but the vast majority of these interactions are still simple commands – weather, music, and remote-like controls for smart-homes. For many brands exploring voice, the desired use-cases are more complex. The interactions are open-ended and might require a few back-and-forths in the dialog. Building this is not easy. It's challenging enough to create an application that can understand these complex requests, then comes the challenge of serving relevant and useful voice content.

The brands I spoke to are taking this challenge head-on. This year I saw a significant increase in the number of brands with an established voice strategy. They've assembled voice-focused teams to design and support use-cases that go beyond the low-hanging fruit of simple command-based interactions. In particular, companies with large catalogs of media assets, like Disney, Discovery, NPR, and BBC are working to adapt their existing content for the growing voice medium.

3. Deeper pairings of voice and visuals to solve discovery and improve experiences

Google, the expert in search and discovery, admitted in their presentation at CES that voice discovery is still broken. To paraphrase - no one is going to save the industry from the voice discovery issue. The issue lies in the challenge of finding and remembering the capabilities of the voice experience. When a user has discovered a feature, it's challenging to remember the exact phrasing needed to get there. Voice-only interactions lack the guidance that screen-based interactions provide. You know which apps are on your phone because you can see them. And you know what the apps can do because you're presented with navigation menus that list the options. For a voice assistant to constantly remind you of all the available possibilities would be completely unusable!

There's an ongoing trend toward voice input and screen-driven output to address this issue. A screen pairing is an efficient way to add guidance while voice input speeds up the time to interact. In one of their presentations, WillowTree pointed out that speaking is three times faster than typing (130wpm compared to 40wpm, respectively) and reading is twice as fast as listening (250wpm compared to 130wpm, respectively). If designed with this in mind, voice prompts will become habitual over time and visuals will act as the main information delivery medium.

I think we can go further than this and create experiences that are not only faster but more engaging. There is an opportunity for rich voice responses that provide context for what's on the screen. Just like a well-done live presentation, the presenter walks the audience through what they're seeing on the screen. As I pointed out in a previous article, audio has an amazing ability to hold our attention. If we integrate voice responses well, we can capture the user's attention and guide them through complex interactions.

4. Rethinking the structure of 3rd party voice experiences

One of the main benefits of voice-first interactions is efficiency. In many cases, the voice app path of invocation - platform > voice app name > intent - is quite a mouth full and not efficient. Instead, domains, or "light integrations" as Google referred to them, are becoming the preferred solution for situations where efficiency is a priority. With a domain, the user can go straight to their intent and the platform will recommend the best option and facilitate the handoff to the brand. Google showed this off in a demo where the user bounced between 1st party features, like accessing the camera and flashlight, and 3rd party mobile apps, like IMDB, Instagram, and Walmart. All of this was done while using implicit language and no platform or app invocations between intents. It was a much more efficient interaction than invoking individual voice apps.

This level of agency in the hands of companies like Amazon and Google, who often have competing solutions of their own, is unsettling to many brands. To get brands comfortable, platforms will need to offer more transparency on how to become the recommended solution for a particular domain. They will also need to provide ample opportunities for users to set preferences and allow space for brands to pitch their solutions.

5. More proprietary voice assistants

Consumer companies have a complex relationship with Amazon and Google. On one hand, they see Alexa and Google Assistant as new channels for reaching users; on the other hand, they want to maintain control and ownership over the customer experience. From a product perspective, misalignment on priorities and timeline can also be frustrating. To combat this, more and more brands are considering proprietary voice assistants, either built in-house or through while labeled solutions from companies like Houndify, Clink, and Microsoft.

Voicebot.ai's Bret Kinsella made a bold prediction on stage at CES that 25% of fortune 500 companies will have their own voice assistant by 2025. We've already seen movement in this direction with BBC, Salesforce, Sonos, BofA, Capital One, Mercedes, and Deutsche Telekom developing proprietary assistants. The success of these assistants will be determined by the focus of their core use-case and the ability to overcome the hardware issue of having a microphone and speaker available for distribution.

The voice industry continues to evolve

It's incredibly exciting to see the determination of industry experts at CES and Project Voice as the voice industry continues to evolve. There is a healthy ongoing dialog on how to push past the industry challenges we face, like designing complex interactions, voice discovery, and the structure of voice experiences. As a result, I saw far more exciting live demonstrations this year compared to last.

If your company has a voice strategy or is starting to explore voice solutions, I'd love to hear from you. Please feel free to reach out on social media or through our website.