Startup Gimlet Labs solves the AI inference problem in a surprisingly elegant way

Stanford adjunct professor and retired founder Zain Asgar just raised an $80 million Series A round for a startup that solves the AI inference bottleneck in a smart way. The round was led by Menlo Ventures.

The company, Gimlet Labshas created what it claims is the first and only “multi-silicon inference cloud,” which is software that allows an AI workload to run simultaneously on several types of hardware. It can split the work of an AI application across both traditional CPUs and AI-tuned GPUs, as well as high-memory systems.

“We’re basically dealing with any different hardware that’s available,” Asgar told TechCrunch.

A single agent can combine multiple steps, and each “requires different hardware: Inference is compute-bound, decoding is memory-bound, and tool calls are network-bound,” lead investor Tim Tully of Menlo writes in a blog post about the funding.

No one chip does it all yet, but as new hardware comes out and old GPUs are retooled, “the multi-silicon fleet is ready — it just lacks the software layer to make it work.” That’s what Tully believes Gimlet Labs offers.

If the current growth-more-computing trend continues, McKinsey estimates Data center spending will reach nearly $7 trillion by 2030. Asgar says applications only use existing hardware already deployed “somewhere between 15 and 30 percent” of the time.

“Another way to think about it: you’re wasting hundreds of billions of dollars because you’re just letting resources sit idle,” he said. “Our goal was basically to try to figure out how you can make AI workloads 10 times more efficient than ever before, today.”

Techcrunch event

San Francisco, California
|
13-15 October 2026

So he and his co-founders, Michelle Nguyen, Omid Azizi, and Natalie Serrino, began building orchestration software that cuts agents’ workloads so they can deploy to all kinds of hardware simultaneously.

Gimlet Labs claims to reliably speed up AI inference by 3x to 10x for the same cost and power. Gimlet says it can even slice the underlying model to run on different architectures, using the best chip for each part of the model.

The company has already partnered with chip makers NVIDIA, AMD, Intel, ARM, Cerebras and d-Matrix.

Gimlet’s product, delivered either as software or via an API in its own Gimlet Cloud, is not intended for the rank and file AI application developer. It is for the largest AI model labs and data centers.

The company went public in October with, he said, eight-figure revenue out of the gate (so at least $10 million). Asghar said his customer base has more than doubled in the past four months and now includes a major model maker and an ultra-large cloud computing company, although he declined to name them.

The co-founders previously worked together at Pixie, a startup that created an open source observability tool for Kubernetes. Pixie was acquired by New Relic in 2020, just two months after launching in a $9 million round led by Benchmark. (Pixie’s technology is now part of the open source organization that oversees Kubernetes.)

After Asgar met Tully by chance about a year ago and also received angel investment from Stanford professors, VCs started calling. After the launch, a term sheet landed on Asgar’s desk. When VCs heard Asgar was considering offers, “we got a pretty big flood of funding” and the round was quickly oversubscribed, he said.

In the previous round, the startup has now raised a total of $92 million, including from a range of angels including Sequoia’s Bill Coughran, Stanford professor Nick McKeown, former VMware CEO Raghu Raghuram and Intel CEO Lip-Bu Tan. The company currently employs 30 people.

Other investors include Factory, which led the seed, Eclipse Ventures, Prosperity7 and Triatomic.

What's Hot

Startup Gimlet Labs solves the AI ​​inference problem in a surprisingly elegant way

Related Posts