With so much money pouring into AI startups, it’s a good time to be an AI researcher with an idea to try. And if the idea is fairly new, it may be easier to get the resources you need as an independent company rather than in one of the big labs.
This is his story Inceptiona startup developing diffusion-based AI models just raised $50 million in seed funding. The round was led by Menlo Ventures, with participation from Mayfield, Innovation Endeavors, Microsoft’s M12 fund, Snowflake Ventures, Databricks Investment and Nvidia’s venture arm NVentures. Andrew Ng and Andrej Karpathy provided additional angel funding.
The project is led by Stanford professor Stefano Ermon, whose research focuses on diffusion models — which produce results through iterative refinement rather than word-for-word. These models power image-based AI systems such as Stable Diffusion, Midjourney, and Sora. Having worked on these systems since the AI boom made them exciting, Ermon uses Inception to apply the same models to a wider range of tasks.
Along with the funding, the company released a new version of the Mercury model designed for software development. Mercury is already integrated into a number of development tools, including ProxyAI, Buildglare, and Kilo Code. More importantly, Ermon says the diffusion approach will help Inception’s models maintain two of the most important metrics: latency (response time) and costing.
“These diffusion-based LLMs are much faster and much more efficient than what everyone is building today,” says Ermon. “It’s just a completely different approach where there’s a lot of innovation that can still be brought to the table.”
Understanding the technical difference requires a little background. Diffusion models are structurally different from auto-regression models, which dominate text-based AI services. Automatic regression models such as GPT-5 and Gemini work sequentially, predicting each subsequent word or word fragment based on previously processed material. Diffusion models, trained to generate images, take a more holistic approach, modifying the overall structure of a response incrementally until it matches the desired outcome.
Conventional wisdom is to use autoregression models for text applications, and this approach has been extremely successful for recent generations of AI models. However, a growing body of research suggests that diffusion models may perform better when a model is processing large amounts of text or managing data limitations. As Ermon says, these properties become a real advantage when working on large codebases.
Techcrunch event
San Francisco
|
13-15 October 2026
Diffusion models also have more flexibility in how they use hardware, an especially important advantage as the infrastructure requirements of AI become clear. Where auto-regression models must execute operations one after the other, diffusion models can process many operations simultaneously, allowing for significantly lower latency on complex tasks.
“We’ve benchmarked at over 1,000 tokens per second, which is way higher than anything that’s possible using existing auto-regression technologies,” says Ermon, “because our thing is built to be parallel. It’s built to be really, really fast.”
