With the AI infrastructure being pushed to staggering proportions, there’s more pressure than ever to squeeze as much inference as possible out of their GPUs. And for researchers with expertise in a particular technique, it’s a great time to raise funding.
This is part of the driving force behind it Tensormeshstarting out of stealth this week with $4.5 million in funding. The investment was led by Laude Ventures, with additional angel funding from database pioneer Michael Franklin;.
Tensormesh is using the money to create a commercial version of the open source LMCache utility, started and maintained by Tensormesh co-founder Yihua Cheng. Used correctly, LMCache can reduce inference costs by up to 10x — a strength that has made it a staple in open source deployments and is drawn from heavy-hitting integrations like Google and Nvidia. Now Tensormesh plans to turn this academic reputation into a viable business.
The core of the product is the key-value cache (or KV cache), a memory system used to more efficiently process complex inputs by condensing them to their basic values. In traditional architecturesthe KV cache is discarded at the end of each query — but Tensormesh co-founder and CEO Junchen Jiang argues that this is a huge source of inefficiency.
“It’s like having a very smart analyst who reads all the data, but forgets what they’ve learned after each question,” says Jiang.
Rather than discarding this cache, Tensormesh’s systems hold onto it, allowing it to be reallocated when the model performs a similar process on a separate query. Because GPU memory is so precious, this can mean spreading data across many different storage tiers, but the payoff is significantly more inference power for the same server load.
The change is particularly powerful for conversational interfaces, as models must constantly refer back to the growing conversation log as the conversation progresses. Agent systems have a similar problem, with a growing record of actions and goals.
In theory, these are changes that AI companies can execute on their own — but the technical complexity makes it a daunting task. Given the work of the Tensormesh team researching the process and the intricacy of the detail itself, the company is betting there will be a high demand for a product that is unavailable.
“Keeping the KV cache on a secondary storage system and reusing it efficiently without slowing down the whole system is a very difficult problem,” says Jiang. “We’ve seen people hire 20 engineers and spend three or four months building such a system. Or they can use our product and do it very efficiently.”
