AI video startup Tavus raises $18 million to bring face and voice cloning to any app

Tavusa four year old The prolific artificial intelligence startup that helps companies create digital “copies” of people for automated personalized video campaigns has confirmed $18 million in new funding and revealed it’s opening up its platform to third parties to integrate their software with the company’s technology.

References appeared in August that Tavus had raised “about $18 million,” but details were scant. The company has now confirmed to TechCrunch that it has indeed raised $18 million in a Series A round led by Scale Venture Partners — an early-stage VC that has previously backed companies like Box, HubSpot, and DocuSign. Other notable investors include Sequoia, which led Tavus’ $6.1 million seed round last year, which participated alongside Y Combinator (YC) and HubSpot.

Video is the focus

The AI creation movement is best exemplified by text-based search engines like ChatGPT and text-to-image models like DALL-E, which OpenAI combines into a single platform that sings. But if the past few months are anything to go by, genetic AI could be on the cusp of another mini-revolution, with video taking center stage.

OpenAI recently introduced Sora, a text-to-video model that could transform the creative industry as we know it. But it’s far from the only player in town, with tech giants like Google working on similar tools for several years, not to mention a number of startups that have raised significant chunks of VC change over the past year for various realizations about how spawns AI can cross with video.

Tavus, for its part, works with its customers to create copies of people through voice and face cloning. The idea is that sales and marketing teams can use Tavus to send personalized videos to prospects at scale, or perhaps a product team can create personalized tour videos to onboard new customers — all through simple text messages that leverage the previously created digital copy. And by integrating Tavus with third-party systems like Salesforce or Mailchimp, companies can automate much of this — for example, a customer who fills out an online form requesting more information about a product can be emailed a video immediately, with salesperson addressing the prospect by name and explaining the next steps.

Tavus has managed to secure some pretty big clients in its short life so far, including Salesforce and Facebook parent Meta, of which it is co-founder and CEO Hasan Raza said they use the platform to upsell their respective B2B clients through personalized demo videos.

Tavus as a platform

So far, Tavus has been served through a SaaS application, through which customers create their own AI video templates. The onboarding process requires a person, such as the CEO or sales executive, to record a 15-minute video based on a script provided by Tavus.

Tavus cloning in action. Image credits: Tavus

It is then used to train the AI, after which the user goes to a web editor and selects which parts of the video they want to personalize by setting the variables — such as location, executive name, company or product. By connecting Tavus to their CRM system, companies can tweak each of these variables to suit a specific customer segment, such as those who have expressed interest in a particular product.

Edit variables. Image credits: Tavus

Companies can create hundreds of these copies with different staff involved, filled with different backgrounds for different target markets.

Through the in-app editor, any number of different scripts can be created to attach to each use case — without having to re-record any of the original video.

The different avatars of Tavus. Image credits: Tavus

While this core SaaS product isn’t going away, Tavus is today lifting the lid on a new supercharged version of its technology along with the first installment of a series of developer APIs that allow third parties to integrate Tavus into their own applications.

Copy

The first aspect of Tavus’ new developer platform to come is the “replica API,” which is all about creating “photorealistic” digital replicas full of text to video. With this, a company can copy a person (eg, chief marketing officer or CEO) using a new proprietary model created by Tavus called “Phoenix”, which is based on a deep learning method called field neural radiation (NeRF). This can create a 3D construction of a person from 2D images in just a few minutes.

“It allows you to essentially create entire videos with just two minutes of training data, which is a big leap forward from how we previously did personalization at scale,” Raza told TechCrunch. “And now all you have to do is record two minutes of training data and it will create a complete copy of you. And once you have a copy, you can make as many videos as you want — from one, two, or a thousand scenarios.”

Tavus: Simulation showing how the Phoenix NeRF model maps a users face to create a realistic replica

Simulation showing how Tavus maps a user face to create a realistic replica. Image credits: Tavus

Tavus' Phoenix model builds a 3D model using 2D video input via Neural Radiation Fields (NeRF).

Output: Tavus’ Phoenix model builds a 3D model using 2D video input via NeRF. Image credits: Tavus

The initial API copy builds on the entire functionality of the Phoenix model and captures the movement of a person’s face, including cheeks, nose, eyebrows, and lips.

“Moving your whole face leads to realism, naturalness and quality – when you speak, your face expresses emotion beyond your moving lips,” explained Raza. “If you want to create an entire video from a script — where you’re talking, one that looks natural and is incredibly high quality — you’d want to use the copy API.”

However, Tavus is also developing a number of additional APIs, including one specifically for lip-syncing, one for dubbing, and one for mass, personalized video campaigns.

The lip-sync API will have a “lower cost of entry,” according to Raza, and is better for situations where a “high degree of quality and realism” isn’t required.

The dubbing API, meanwhile, also uses the lip-sync model, but also includes multilingual voice cloning, which means a monolingual user can send video campaigns in any languages using their own voice. In this case, since most of the video will remain the same, the API allows for simple replacement of lip movements to align with the different sounds coming from the user’s mouth. This could prove useful for creators of a video editing software suite, for example, where they wish to allow their users to add sync, edit and dub to their videos.

The Video Campaign API then essentially bundles the copy API together with a number of additional tools — such as hosting, variable mapping, thumbnails, and analytics — for those looking to launch large-scale video campaigns.

“We’re enabling any developer to deliver an end-to-end video campaign experience through their own solutions,” said Raza. “While the copy and lip-sync APIs are more of a ‘model-as-a-service’ model, the campaign API gives you tools to easily build an AI video campaign platform.”

Raza remained tight-lipped about who some of the early adopters of the Tavus platform are, but said it is “partnering with one of the biggest video platforms” for customer engagement. “They’re trying to bring that to their millions of customers who already use their platform to create videos on a daily basis,” Raza said.

Deepfake dilemma

Instinctively, platforms like Tavus are ripe for abuse — after all, what’s to stop someone from uploading a pre-existing video to create a digital copy? Deepfakes are indeed a growing concern in the burgeoning AI movement, but Raza says they’ve put controls in place to avoid stinginess. For example, when a user submits the two-minute training material, they must also submit a specific verbal consent statement, which is then aligned with the audio in the training material to ensure there is a match.

“We run these checks automatically and then do a human check on every copy that goes through the automated checks to ensure security,” Raza said.

It’s easy to see how this could work with Tavus as a standalone SaaS application, but now that it’s a platform that any number of companies can access through an API, who then controls the verification? Well, as it turns out, Tavus is — the company wants to keep its hands on the verification wheel, even if it’s just providing the engine to third-party developers.

“We perform the same checks and take responsibility for verifications with [the] API too,” Raza continued.

Expanding reality

While OpenAI has pretty much become the public face of genetic AI, there is more than enough room for different players to bring something different to the mix. Indeed, while DALL-E and OpenAI’s recently released Sora model is mostly about helping people create graphics from text messages, Raza says Tavus’ raison d’etre is more about “extending” itself. reality of a person.

“We see a future where everyone wants to have a digital copy of themselves. they control that and have full authority over that,” Raza said. “And it will be important that it ends up capturing more and more of your personality, more and more of your gestures and features. That’s how we see things going forward — there will be models that create things that don’t exist, and then there will be models that extend your reality.”

With $18 million in the bank, Raza said the recent cash injection will be used to “fuel the fire that’s already burning” at Tavus Towers.

“We’re an AI research company, so we want to be able to continue development on newer models like Phoenix,” Raza said. “But then just maintaining our growth, we’ve had a ton of demand all the time. And we want to be able to continually hire our machine learning and engineering teams to support our developers and SaaS customers.”

What's Hot