Noisy interview and speech recordings are the bane of sound engineers’ existence. But a German startup hopes to fix that with a unique technical approach that uses genetic artificial intelligence to improve the clarity of voices in video.
Today, AI-coustics emerged from stealth with €1.9 million in funding. According to co-founder and CEO Fabian Seipel, AI-coustics goes beyond standard noise cancellation to work on — and with — any device and speaker.
“Our core mission is to make every digital interaction, whether on a conference call, consumer device or casual social media video, as clear as a broadcast from a professional studio,” Seipel told TechCrunch.
Seipel, a trained audio engineer, co-founded AI-coustics with Corvin Jaedicke, a lecturer in machine learning at the Technical University of Berlin, in 2021. Seipel and Jaedicke met while studying acoustic technology at TU Berlin, where they often encountered poor sound quality in the online courses and seminars they had to attend.
“We have been driven by a personal mission to overcome the pervasive challenge of poor audio quality in digital communications,” Seipel said. “While my hearing is slightly impaired due to music production in my early twenties, I’ve always struggled with online content and lectures, which led us to work on speech quality and intelligibility first.”
The market for AI-powered noise reduction and voice enhancement software is already very strong. Competitors to AI-coustics include Insoundz, which uses genetic artificial intelligence to enhance streams and pre-recorded speech clips, and Veed.io, a video editing suite with tools to remove background noise from clips.
But Seipel says AI-coustics has a unique approach to developing AI mechanisms that do the actual work of noise reduction.
The startup uses a model trained on speech samples recorded at the startup’s studio in Berlin, AI-coustics’ home city. People are paid to record samples—Seipel wouldn’t say how many—that are then added to a dataset to train AI-coustics’ noise-reduction model.
“We developed a unique approach to simulating acoustic artifacts and problems—eg. noise, reverberation, compression, band-limited mics, distortion, clipping and so on— during the training process,” Seipel said.
I’ll bet some will take issue with AI-coustics’ one-time compensation program for creators, given the model the startup is training could prove quite lucrative in the long run. (There is a healthy debate about whether the creators of training data for AI models deserve residuals for their contributions.) But perhaps the bigger, more immediate concern is bias.
It’s been proven that speech recognition algorithms can develop biases — biases that end up harming users. ONE study published in The Proceedings of the National Academy of Sciences showed that speech recognition from leading companies was twice as likely to incorrectly transcribe audio from black speakers as opposed to white speakers.
In an effort to combat this, Seipel says AI-coustics focuses on recruiting “different” coefficients of speech samples. He added: “Size and diversity are key to eliminating bias and making technology work for all languages, speaker identities, ages, accents and genders.”
It wasn’t the most scientific test, but I uploaded three video clips — one interview with an 18th century farmerone car driving demonstration and one Protesting the Israel-Palestine conflict — on the AI-coustics platform to see how well it performed with each. AI-coustics did deliver on its promise of clarity enhancement. To my ears, the edited clips had much less background noise drowning out the speakers.
Here is the 18th century farmer clip before:
And after:
Seipel sees AI-coustics technology being used to enhance real-time as well as recorded speech, and perhaps even being integrated into devices such as sound bars, smartphones and headphones to automatically enhance voice intelligibility. Currently, AI-coustics offers a web application and API for post-processing audio and video recordings and an SDK that brings the AI-coustics platform to existing workflows, applications and hardware.
Seipel says AI-coustics – which makes money through a combination of subscriptions, on-demand pricing and licensing – has five enterprise customers and 20,000 users (though not all of them are paying) currently. On the roadmap for the coming months is expanding the company’s four-person team and refining the underlying speech enhancement model.
“Prior to our initial investment, AI-coustics ran a fairly lean operation with a low burn rate in order to survive the rigors of the VC investment market,” said Seipel. “AI-coustics now has a significant network of investors and consultants in Germany and the UK for advice. A strong technology base and the ability to address different markets with the same database and core technology gives the company flexibility and the potential for smaller axes.”
Asked whether sound mastering technology like AI-coustics could steal jobs as some experts fearSeipel noted the potential for AI-coustics to speed up time-consuming tasks that currently fall to human audio engineers.
“A content creation studio or broadcast director can save time and money by automating parts of the audio production process with AI-coustics while maintaining the highest speech quality,” he said. “Speech quality and clarity continues to be a nagging problem in almost every consumer or professional device, as well as content production or consumption. Any application where speech is recorded, processed or transmitted can potentially benefit from our technology.”
The funding came in the form of an equity and debt tranche from Connect Ventures, Inovia Capital, FOV Ventures and Ableton CFO Jan Bohl.