It’s AI Way Or The Highway
As content surges, intelligent, time-saving software tools allow creators to focus more on their craft.
By: Brad Grimes
Digital audio is having a moment. In 2021, podcast listening was up (again), according to Edison Research, and the share of people consuming any kind of streamed audio — music, radio, even the emerging category of “spoken” news articles — reached 68 percent. That’s up 20 percent from five years ago. And the money is following.
According to the Interactive Advertising Bureau, ad spending on digital audio rose 58 percent last year, dwarfing the increases for streaming video and even social media. This means audio engineers and content producers are busy. Not only is demand for their work rising, but the quantity of digital audio being created is too. “Teams’ workloads are going way up as production shops look to do more,” says audio engineer Rob Byers, who’s worked with NPR and American Public Media and is currently Technical Director for Criminal Productions at Vox Media. “Plus, there’s a lot of content out there that does not necessarily have a professional audio engineer involved.”
On the production side, it’s never been easier for creators to capture digital audio, whether on their phones, in-studio, using digital field recorders, or through online recording platforms like Riverside.fm or SquadCast. The challenge is turning all that content into compelling, high-quality audio for consumption by an increasingly discerning public. “The quantity of content being produced is growing exponentially, but audio quality has suffered at times,” explains Audun Solvang, Chief Technology Officer and co-founder of Nomono. Nomono makes podcasting solutions that allow creators and broadcast journalists to capture audio more easily — including spatial audio — while automating processing tasks through cloud-based artificial intelligence technology. “It’s unfortunate when you hear compelling content, but the audio quality is inferior and distracts from the story,” Solvang says. “Capturing great sounding audio from the very beginning is most important, but audio engineers often end up having to spend hours cleaning up poorly recorded audio. That’s where software and AI can help free up time for those engineers.”
NEW WAYS OF UNDERSTANDING AUDIO
Jonathan Wyner is chief engineer at M Works Mastering, past president of the Audio Engineering Society, and a professor at Berklee College of Music in Boston. He’s also education director for iZotope, which develops software for audio applications, and has seen the urge from audio engineers and producers to branch out. “The diversification in audio workflows, outputs, and ways audio is deployed, whether for entertainment, education, enterprise, or some other purpose, has really exploded,” Wyner says. “There’s motivation for engineers to be more versatile, and with machine learning and other new technologies coming on, we’re going to see a whole host of tools that accelerate aspects of our work.”
iZotope, for example, created a software plug-in for tonal balance control in sound mixes that leverages machine learning. It’s based on an idealised range of tonal balance derived from many thousands of existing recordings. “A simple idea,” Wyner says, “but it wasn’t until we had the ability to ingest and analyse all this information digitally that we could present it back to the user and help them improve the sound of their mix.” The company also develops technology to perform automatic source separation, effectively ‘un-mixing’ source audio into its component parts for easier manipulation. Examples might include isolating a vocal from a song, or the dialog from an interview conducted in a noisy location. “These systems aren’t making decisions for us,” Wyner says. “But they’re offering a new way of intelligently understanding and extracting audio features using a data approach.”
I can type your name into the edit, and it automatically fixes your name in my voice
THE ROLE OF ARTIFICIAL INTELLIGENCE
Artificial intelligence is already permeating audio engineering and content creation. For professional engineers, this means less time fixing nagging flaws and more time contributing to storytelling or sound design. For professional producers and a growing legion of do-it-yourself podcasters, it means achieving higher-quality audio without having to understand, for example, tonal balance or EQ settings or simple editing tasks. “We now have AI-based tools that help identify and correct some of the biggest mistakes in audio,” says Jay LeBoeuf, head of business and corporate development at Descript. “And we can use the extra time we gain to improve our storytelling and make our material sound even better. The goal is to help remove the tedious work that stands between an idea and its expression so creators can focus on telling stories instead of tinkering.”
Descript uses AI to make audio editing like text editing. Several high-profile podcast producers, including NPR and The New York Times, use it to simplify the process. The platform includes a feature called Overdub that creates a digital clone of a speaker’s voice so it can be used during editing. “Let’s say we’re doing a podcast and I get your name wrong,” LeBoeuf explains. “Before, I’d need to go back into the studio and re-record those parts, but now I can type your name into the edit, and it automatically fixes your name in my voice.” AI for Mundane Tasks Byers says producers have caught onto the ways new software can improve audio content and have begun to expect higher quality. All while generating more audio in the same amount of time. “For instance, removing tiny mouth clicks from the audio,” he says. “Now that we have software tools that can automatically de-click our audio, producers expect it.”
Nomono’s AI-powered processing automatically enhances dialog while reducing noise and crosstalk in podcast recordings. Armed with clean, high-quality recordings, audio engineers and producers can make more effective use of the many new AI-based editing tools. The company also develops solutions to organise audio content and streamline collaboration throughout a production. “For the sake of speed, we need tools that make it possible to collaborate on an edit, pull out the best audio, and generate something clear, concise, and thoughtfully crafted,” says Solvang. “It’s how we arrive at those perfect few minutes of audio from a two-hour recording.”
LeBoeuf believes there’s never been a better time for audio engineers, with demand growing for new and richer podcasts, as well as more immersive audio experiences in entertainment, games, and something called the Metaverse. But that also means engineers are being pulled in more directions and need tools to automate repetitive tasks. “And on the creator side,” says LeBoeuf, “the tools help them avoid the most common mistakes and sound better as they go, because they may not have the time or the budget for a dedicated audio engineer.”
No one believes, however, that software can take over for audio engineers, especially with consumer demand raising the bar on high-quality audio. If anything, says Byers, engineers know best what audio challenges lay ahead. “Do audio engineers feel like their livelihood is at risk because of this software? No,” he says. “As easy as these tools have become, there are always opportunities for more problems to be solved.”
Brad Grimes is a long-time technology journalist and former communications director of the Audiovisual and Integrated Experience Association.