ZeroPoint’s nanosecond-scale memory compression could tame power-hungry AI infrastructures

Artificial intelligence is only the latest and hungriest market for high-performance computing, and system architects are working around the clock to squeeze every drop of performance out of every watt. Swedish startup Zero Pointarmed with €5 million (US$5.5 million) in new funding, he wants to help them with a new nanosecond-scale memory compression technique — and yes, it’s exactly as complicated as it sounds.

The idea is this: losslessly compress data just before it enters RAM and decompress it afterwards, effectively widening the memory channel by 50% or more by adding just a tiny bit to the chip.

Compression is, of course, a core technology in computing. as ZeroPoint CEO Klas Moreau (left in the image above, with co-founders Per Stenström and Angelos Arelakis) pointed out, “We wouldn’t store data on a hard drive today without compressing it. Research shows that 70% of data in memory is unnecessary. So why don’t we compress the memory?’

The answer is we don’t have it year. Compressing a large file for storage (or encoding it, as we say when it’s video or audio) is a task that can take seconds, minutes or hours depending on your needs. But data moves through memory in a fraction of a second, shifting in and out as fast as the CPU can handle it. A microsecond delay to remove “unnecessary” bits in a data packet entering the memory system would be disastrous for performance.

Memory doesn’t necessarily keep pace with CPU speeds, although the two (along with many other chip components) are inextricably linked. If the processor is too slow, data is backed up to memory — and if memory is too slow, the processor wastes cycles waiting for the next bunch of bits. They all work together, as you would expect.

While ultra-fast memory compression has been proven, it leads to a second problem: Essentially, you have to decompress the data as fast as you compressed it, back to its original state, or the system will have no idea how to handle it. So unless you convert your entire architecture to this new compressed memory mode, it’s pointless.

ZeroPoint claims to have solved both of these problems with ultra-fast low-level memory compression that requires no real changes to the rest of the computing system. You add their technology to your chip and it’s like doubling your memory.

While the nitty-gritty details will probably only be understood by people in the field, the basics are easy enough for the uninitiated, as Moreau demonstrated when he explained it to me.

“What we do is take a very small amount of data — a cache line, sometimes 512 bits — and recognize patterns in it,” he said. “It’s the nature of data, it’s full of not-so-effective information, sparsely located information. It depends on the data: The more random it is, the less compressible it is. But when we look at most data loads, we see that we’re in the two to four times range [more data throughput than before].”

That’s not how memory actually works. But you get the idea.

Image Credits: Zero Point

It’s no secret that memory can be compressed. Moreau said that all large-scale computing knows about the possibility (he showed me a paper from 2012 demonstrating it), but has pretty much written it off as academic, impossible to implement at scale. But ZeroPoint, he said, has solved the problems of compression—reorganizing compressed data to be even more efficient—and transparency, so the technology not only works but works seamlessly with existing systems. And it all happens in a handful of nanoseconds.

“Most compression technologies, both software and hardware, are on the order of thousands of nanoseconds. CXL [compute express link, a high-speed interconnect standard] can bring it down to the hundreds,” Moreau said. “We can get it down to three or four.”

Here is CTO Angelos Arelakis explaining it in his own way:

ZeroPoint’s debut is certainly timely, with companies around the world looking for faster and cheaper computations with which to train another generation of AI models. Most overscalers (if we should call them that) are interested in any technology that can give them more power per watt or let them lower their power bill a bit.

The main caveat to all of this is simply that, as mentioned, this has to be included on the chip and built in from the ground up — you can’t just put a ZeroPoint dongle on the shelf. To that end, the company works with chipmakers and system integrators to license the technology and hardware design to standard high-performance computing chips.

Of course that’s your Nvidia and Intel, but increasingly companies like Meta, Google and Apple, who have designed custom hardware to run AI and other high-cost tasks internally. ZeroPoint positions its technology as a cost-saver, however, not a premium: Understandably, by essentially doubling the memory, the technology pays for itself in no time.

The just-closed €5 million A round was led by Matterwave Ventures, with Industrifonden acting as the local Nordic lead and existing investors Climentum Capital and Chalmers Ventures also participating.

Moreau said the money will allow them to expand into American markets, as well as double the Swedish ones they are already pursuing.

What's Hot

ZeroPoint’s nanosecond-scale memory compression could tame power-hungry AI infrastructures

Related Posts

Leave A Reply Cancel Reply