Screens grab your attention; audio holds onto it

Shaun Withers

Shaun Withers

shaun withers jargon

Is visual and aural attention inherently different? Are there lessons we can learn from the rise of podcast listening that can be applied to voice content design? Shaun Withers, a team member at Jargon, takes a look at the characteristics that make listening unique, and the potential it presents for the voice industry. This blog post was originally posted in the UX Collective publication on Medium.



Our attention bounces between tabs, apps, threads, channels, and feeds all day. We try our best to focus on just one crucial task. Instead, we pick up our phones 150 times per day in response to shiny notifications and alerts. Visuals are very effective at quickly grabbing our attention. But once a visual has our attention, it’s not long before another visual snatches it away again.

Despite this new world of fragmented attention, the recent rise in popularity of podcasting has shown us that focus can be sustained. It’s shown us that deep concentration, empathy, and comfort can come from listening. The majority of podcast listeners listen to the end of the podcast. They even pay close attention to the ads they hear and take action. There must be something unique about podcast listening to achieve this level of attention.

If audio content has superpowers, it seems obvious that voice assistants like Alexa, Siri, and Google Assistant would take full advantage of aural attention. That hasn’t happened yet. Until now, it has been the convenience of hands-free commands, and not the interactive listening experience, that has driven the adoption of voice assistants. There is an incredible opportunity to create deeply engaging experiences, delivered through interactive voice assistants. To accomplish this, let’s unpack why listening is so special.

Listeners hang on to every word



To understand the intricacies of attention, we must first understand how we take in information. When we read, our eyes can quickly shift — far, close, up, down. We can scan our environment and build an understanding of the many moving things around us, but a new captivating visual can quickly distract us.

According to a Microsoft research report, our attention span has dropped from 12 seconds in 2000 to 8 seconds in 2013, less than the average goldfish. Multi-screen behavior is among the leading causes.

In comparison, ears can’t move, causing us to focus on one thing at a time. Have you ever tried to have a conversation with someone while a loud commercial is blaring in the background? It’s distracting and debilitating. When we listen, we direct our attention to a single stream of dialog. If that dialog is interesting, our attention tends to stay there and we slip into deep focus. If the dialog is important, we intentionally pay close attention because the content disappears once it’s been said. Most often, there is nothing left to reference once the dialog is over, so we hang on to every word.

Listeners understand the storyteller’s state of mind



In his book, Sapiens, Yuval Noah Harari argues that humans’ capability to tell captivating stories sets us apart from other species on earth. It’s the reason we can collaborate in the billions and pass on our innovations across generations. The stories we tell communicate thoughts and ideas, but just as importantly, they communicate emotional context. We digest the sentiment and meaning of the storyteller through the tone of their voice.

When we read, we use our own internal voice to emulate the storyteller. When we listen, we hear the cues and characteristics of someone else’s voice first hand, painting a picture of their emotional state. Listening is a far more intimate experience than reading. It allows us to dig much deeper into understanding the storyteller.

Nicholas Epley, Professor of Behavioral Science at Chicago Booth writes, “You can’t see the mind of another person, but my research suggests you can hear it. Hearing somebody talk is the closest you’re ever going to get to their ongoing mental experience, and it’s not just the content of the words that provides this. Paralinguistic cues of all sorts provide honest signals for the presence of thinking and feeling.”

Side note — Tone of voice doesn’t only apply to human voices. Voice apps on Alexa and Google Assistant can deliver synthetic voices, which are AI generated emulations of the human voice. They are improving quickly, but don’t yet sound natural. To combat this, voice app designers can use markup tools to change the intonation, pitch, emphasis, etc. of synthetic voices. This allows the designer to give the voice any tone she would like.

Listeners are more likely to empathize



Ever feel like social media threads feel a bit more like road rage than a social gathering? It turns out, our empathy levels are different when listening and reading the same content.

As Juliana Schroeder, a professor at the University of California at Berkeley, puts it, “Beliefs that are communicated by voice make the communicator seem more reasonable, even human. But those same beliefs are stripped of the humanizing elements when the opinions are communicated on a piece of paper.”

In her study, she determines that the participants that read arguments from others “dehumanized” the storyteller and regarded them as “having a diminished capacity to either think or feel.” In contrast, the participants who listened to the argument were much less dismissive and could understand the other’s reasoning.

Podcast ads are often read by the host, leading to a much higher ad conversion rate than other media. The trust and empathy for the host trickles into the authenticity of their message. The listener finds the message more believable, even though they understand that it’s paid for and mostly scripted by a company. If persuasion is the goal of the story, as it often is, delivering the message audibly will increase it’s effectiveness.

Listening is comfortable



We started listening before the day we were born. It’s deeply tied to our desire to feel connected. Listening to audio content can emulate these feelings of connection, even if it’s one directional. It provides a level of comfort and relaxation, similar to being read a story by a parent as a child.

According to iHeartRadio, Millennials and Gen Z listen to more than 18 hours of audio content per week. One the main reasons they turn to podcasts is to relax and calm down. Podcast listening can be an effortless way to follow a story or conversation while you’re busy doing something else. Nothing is more evident of this comfort than the podcast industry’s remarkable attentiveness metric — 85% of people who listen to podcasts, listen to the end, according to Edison Research. That is a far higher retention rate than any other form of media.

The possibilities for sustained attention



The popularity of podcasting has shown us that we’re hungry for a medium that sustains our attention. It evokes deep concentration, heightened empathy, and comfort. It’s no surprise that tech giants are investing billions of dollars into voice assistant technology, which taps into this medium. They want to maintain their grip on the attention economy when it shifts from visual to aural attention.

The truth is, voice assistants have yet to match the comfort and enjoyment of listening to a great podcast. In fact, smart speakers are rarely even used for listening to podcasts. It’s much easier to select a podcast on your phone and listen with headphones. Voice assistants are primarily used for simple interactions — what’s the weather, set a timer, and play a song.

How podcasting, or storytelling more generally, will intersect with voice technology in the future has yet to be determined. For conversations to feel natural on voice, the dominant voice platforms need to make a few significant improvements. This will unlock the ability to design incredible experiences that take full advantage of the unique powers of listening.

The next time you tell a story, in any form, consider how the audience engages with the story and where their attention lies. There is much more nuance to attention than one might think. The choice of media used to deliver the story plays a significant role in the attention you capture. Do you want to grab someone’s attention, or hold onto it?

shaun withers jargon

CREDIT: @ESMITH_IMAGES/INSTAGRAM​


We would love to hear what you think. Do you think podcast-style content could be effective in voice apps? What changes need to be made for voice apps to be truly engaging listening experiences? Will podcasting evolve to become interactive? Reach out to us with your thoughts or comment directly on medium.



Looking to start a career at Jargon? Check out these additional resources below:
About Jargon
Jargon Careers


Image Description

About Jargon

Jargon enables voice applications on services like Amazon Alexa and Google Assistant to structure, manage, and optimize their content. Join our mailing list to stay current with Jargon news and product updates!

Join our Mailing List