Thought Leaders

The Future of Podcasting is AI

Published

2 years ago

November 2, 2022

Roughly speaking, about 22,000 new podcasts are launched in a month. There are close to 2.5 million (more than 71 million episodes) in the Apple Podcasts directory right now, according to Podcast Industry Insights. And those are just the ones we know about.

“A lot of podcasters aren't even going through the big platforms now. They’re going direct to their listeners, selling premium content and having big success,” says Andy Taylor, formerly of BBC Radio and founder of Cardiff-based R&D consultancy Bwlb.

And that’s to say nothing of the growing volume of podcast-like content, whether created by brands for promotion or event producers that want, for example, to make talks available on-demand. Every piece of content needs to be produced and distributed, whether by audio professionals or folks learning the craft. Therefore, the more they can automate large swaths of production, the more they can focus on the content.

“The different places audio is being published have just exploded,” explains Jonathan Wyner chief engineer at M Works Mastering and a professor at Berklee College of Music in Boston. “With all those contexts, there is a real motivation and imperative for creators to be more versatile.”

Not to mention, more productive and efficient.

The Rise of AI

Artificial intelligence (AI) — software that can automate tasks previously done by humans — holds the key to handling the tsunami of podcast content. Not only can AI speed up production, it can make podcasts sound better and set the stage for the audio experiences of tomorrow.

“AI basically helps take care of repetitive tasks to quicken the workflow of the podcaster,” explains Manos Chourdakis, research engineer at Nomono, which develops AI-based podcasting tools. “For example, with AI, you don't have to listen to a whole podcast to find where someone said something wrong, then replace or remove it. You could do that yourself, but AI does it faster.”

Then there are chores that can only be accomplished with AI — at least at scale, such as removing noise or enhancing dialogue. “Good-quality dialogue enhancement would be impossible without AI,” Chourdakis says. “At least impossible in a reasonable timeframe using traditional tools.”

Perfect for Menial Tasks

Applications of AI in podcasting are as varied as production tasks. Some are built directly into podcast platforms. When creators upload their podcasts to hosting platform Podcast.co, the system automatically “listens” to the audio files and normalizes sound levels.

“Any tool that can help reduce the mind-numbing bits of a job is a good thing,” says Mike Cunsolo, the platform’s co-founder. Cunsolo also runs Cue, a podcast production company working with corporate brands, and Matchmaker.fm, which connects podcast producers with guests. “You’ll always need that human expertise element, but soon machines could learn to understand what makes a podcast interesting and reduce time on task.”

Solution provider Descript applies AI to many aspects of podcast engineering, including noise removal and echo control. One of the more “mind-numbing” chores Descript can handle is room tone.

“Sometimes producers need to insert digital silence into a podcast. Maybe between edits or to drag out the spacing between sentences,” says Jay LeBoeuf, head of business and corporate development at Descript. “But that sounds incredibly unnatural.”

If producers didn’t capture room tone when a podcast was recorded, they may have to go back and get it. Or they can listen for it in the recording, copy-and-paste where needed, then edit the result to make it blend naturally.

Or computers can handle it. Descript’s AI-based room tone generator analyzes a recording, identifies the room tone, and automatically synthesizes it where it’s needed. Such technology not only obviates menial tasks, it allows for greater production flexibility.

“AI is going to allow us to use less expensive hardware, worse-sounding rooms, and noisier locations and still get good results,” says Nomono’s Chourdakis.

New AI-Based Capabilities

AI also opens the door to innovation in podcasting — creating new solutions that raise the bar for podcasters and listeners. For example, the Epidemic Audio Reference (EAR) tool helps podcasters find copyright-free music based on songs they like.

“Say you’re looking for intro or outro music, and you’re thinking of a particular song, but it’s protected by copyright,” says Chourdakis. “The system uses AI under the hood to help you find something similar.”

At Bwlb, Taylor’s team developed Accordion, an AI-based solution that can take a podcast and reproduce it at various lengths.

“Every other part of our life is getting smarter — smart homes, smart refrigerators,” Taylor says. “People want more control and convenience from their podcast experience, too.”

When Taylor worked on documentaries for the BBC, he’d be asked for shorter versions to run on different platforms. The process was always manual. Accordion applies software algorithms to podcast content to intelligently create versions of different lengths. “It doesn’t speed anything up,” Taylor says, “but it gives the user control over the duration of the content without losing tone structure or listenability.”

Putting the Focus on Immersive Storytelling

The more podcasters use AI tools, the better they become. In other words, the more data they ingest, the more they learn.

Nomono’s dialogue enhancement algorithms are based on large datasets of voice recordings — some clean and intelligible, some less so — which teach the AI tools how to generate better sound. “Podcasters shouldn’t need advanced audio knowledge to produce high-quality audio,” says Chourdakis. “By automating some of these tasks, they can spend more time focusing on great storytelling, and less time on tedious clean-up tasks.”

And in the future, they can evolve more easily to create a new genre of immersive, spatial podcasts. For example, Nomono’s technology enables object-based audio production, which allows producers to “place” voices in a 3D soundscape or create dynamic versions that can be tailored to listeners.

“Media production is now entering a phase where if you can dream it, it can happen,” says Descript’s LeBoeuf. “And you no longer need to have an expensive studio or decades of training to accomplish your goals.”