For years, Vyas Sekar would call Muckai Girish, an old friend from undergrad, to talk about potential startup ideas and get Girish’s opinion. The two usually talked through an idea and ended the conversation at that. When Sekar called Girish with an idea involving synthetic data in early 2022, the conversation didn’t just end when they hung up.
Sekar and his colleague at Carnegie Mellon University, Giulia Fanti, were working on creating synthetic data to remedy the reproducibility crisis, or inability to reproduce data, in academia. While Sekar saw the need for a solution in academia, Girish knew his clients at the time were facing the same problem. After talking to a few businesses, the thesis was further validated.
“At the time, I felt like this was very real and there was an opportunity,” Girish, CEO, told TechCrunch. “So that’s what got us started and over the next couple of months we talked to some investors, people we knew and most importantly businesses and we realized that this was a major problem and it was worth putting, you know, a lifetime behind it . “
The result was Rockfish, a startup that uses genetic AI to create synthetic data for operational workflows to help businesses break down their data silos. Rockfish integrates with database providers including AWS and Azure, among others, and helps users choose the best configuration for their data based on company policies or uses for the data.
Synthetic data is becoming an increasingly hot topic in the AI world, but there was already growing momentum when the company launched in June 2022. Girish said Rockfish wanted to make sure it was building a product that differentiated itself. peer-to-peer and also a solution that businesses will use every day, not just every once in a while.
That’s why the company’s product is designed to ingest data continuously and focuses on operational data, which includes data on things like financial transactions, cybersecurity and supply chains. These areas are constantly generating data for companies and are also constantly changing. Girish believes the focus here helps Rockfish stand out from other competitors.
Now the company works with a handful of enterprise customers, Girish said, including streaming analytics platform Conviva, as well as government agencies such as the US Army and the US Department of Defense.
Rockfish announces a $4 million seed round led by Emergent Ventures with participation from Foster Ventures, TEN13 and Dallas VC among others. This brings the company’s total funding to around $6 million.
Anupam Rastogi, managing partner at Emergent Ventures, told TechCrunch that he had been following Sekar long before he founded Rockfish. He said what made the company invest was “team, market and product, in that order.” Additionally, Rockfish’s focus on building for enterprises made it a better fit for Emergent than some of the other players in the space.
“The team is extremely high-quality, multi-Ph.D. data scientists,” Rastogi said. “This is a space that we think is very technically advanced and having that technical strength around the table is really critical. They’ve done a lot of groundwork in the space, not just in the company, but in the entire industry.”
While Rockfish hopes its focus will help give it a moat among competitors, it doesn’t change the fact that synthetic data will likely be an increasingly crowded market. AI companies are turning to synthetic data as many players believe the market has exhausted other AI training data.
There are already numerous startups looking to tackle the market, including Tonic AI, which has raised more than $45 million in venture funding. Mainly AIwhich has raised $31 million in VC funding. and Mistywhich raised $14.5 million before being acquired by SAS in 2024, to name just a few.
Girish said the company wants to add to its synthetic data approach by incorporating other types of models such as state space models, mathematical models that use state variables. The company is also looking to improve its end-to-end features.
“It’s not like you’re taking random data on the Internet and creating synthetic data,” Girish said. “There is no guarantee that it will go well. But if you put it all together for business, it’s actually very relevant and realistic. So that’s the key to it, and then being able to do that on a consistent basis is what we find helpful.”