Generative AI has captured the public imagination with a leap into creating elaborate, realistic text and images from verbal prompts. But the catch – and there’s often a catch – is that the results are often far from perfect when you look a little closer.
People point out strange fingers, floor tiles slip and math problems they are just that: problematic, sometimes they don’t add up.
Now, Synthesia — one of the ambitious AI startups working on video, customized avatars designed for business users to create promotional, educational and other corporate video content — is rolling out an update it hopes will help it overcome some of the challenges of the specific field. Its latest version features avatars — modeled after real people captured in their studio — that provide more emotion, better lip tracking and, it says, more expressive natural and human movements when fed with text to create video.
The launch comes after some impressive progress for the company to date. Unlike other prolific AI players like OpenAI, which has built a two-pronged strategy – raising massive public awareness with consumer tools like ChatGPT, while also building a B2B offering, with its APIs used by independent developers as well as from enterprise giants—Synthesia leans toward the approach taken by some other prominent AI startups.
Similar to Perplexity’s focus on truly immersive genetic AI search, Synthesia focuses on actually building the most human-like video avatars. More specifically, that’s what he’s looking to do only for the enterprise market and use cases such as education and marketing.
This focus has helped Synthesia stand out in a very crowded AI market that runs the risk of becoming commoditized when the hype settles on longer-term concerns like ARR, unit economics and operational costs associated with AI applications.
Synthesia describes its new Expressive Avatars, the release released Thursday, as the first of its kind: “The world’s first fully AI-generated avatars.” Built on large, pre-trained models, Synthesia says its breakthrough was in how they combine to achieve multimodal distributions that more closely mimic the way real people speak.
These are created on the fly, says Synthesia, which are meant to be closer to the experience we have when we speak or react to life. This contrasts with the way many avatar-based AI video tools work today: It’s usually multiple pieces of video that are quickly stitched together to create facial responses that more or less align with the scenarios fed to them. . The goal is to look less robotic and more alive.
Previous version:
New version:
As you can see in the two examples here, one from the older version of Synthesia and the one that will be released on Thursday, there is still a way to go, something CEO Victor Riparbelli himself admits.
“Of course it’s not 100% there yet, but it will be very, very soon, by the end of the year. It’s going to be so shocking,” he told TechCrunch. “I think you can also see that the AI part is very thin. With humans there is so much information in the tiniest details, the tiniest movements of our facial muscles. I think we could never sit down and describe, “Yeah, you smile like that when you’re happy, but that’s fake, right?” This is such a complicated thing to ever describe for humans, but it can be [captured in] deep learning networks. They’re really able to understand the pattern and then reproduce it in a predictable way.” The next thing he’s working on, he added, is the hands.
“Hands are, like, super hard,” he said.
The B2B focus also helps Synthesia anchor its messaging and product more on the “safe” use of AI. This is important, especially with the huge concern today about deepfakes and the use of AI for malicious purposes such as disinformation and fraud. Even so, Synthesia hasn’t been able to completely avoid controversy on this front. Synthesia’s technology was in the past bad use to produce propaganda in Venezuela and false news reports promoted by pro-China social media accounts.
The company noted that it has taken further steps to try to limit this use. Last monthupdated its policies, it said, “to limit the type of content people can create by investing in early detection of malicious actors, increasing teams working on AI security, and experimenting with content credential technologies like C2PA.”
Despite these challenges, the company continued to grow.
Synthesia was last valued at $1 billion when it raised $90 million. Specifically, this fundraiser took place almost a year ago, in June 2023.
Riparbelli said in an interview earlier this month that there are currently no plans to raise more, though that doesn’t really answer the question of whether Synthesia is being approached proactively. (Note: We’re very excited to have the real man Riparbelli speak at our London event in May, where I’ll definitely be asking this again. Come if you’re in town.)
What we know for sure is that AI costs a lot of money to build and run, and Synthesia has built and runs a lot.
Before Thursday’s release, about 200,000 people had created more than 18 million video presentations in about 130 languages using Synthesia’s 225 legacy avatars, the company said. (It doesn’t make clear how many users are on the paid tiers, but there are plenty of big-name customers like Zoom, the BBC, DuPont and more, and businesses are paying.) The startup’s hope, of course, is that with the new version coming out outside, these numbers will increase even more.