As AI increasingly moves from the cloud to the device, how exactly is anyone supposed to know if this and that new laptop will run a GenAI-powered app faster than competing off-the-shelf laptops — or desktops or all-in, for that matter? The knowledge could mean the difference between waiting a few seconds to create an image versus a few minutes — and as they say, time is money.
MLCommons, the industry group behind a series of AI-related hardware benchmarking standards, wants to facilitate comparison with the release of performance benchmarks aimed at “customer systems” — e.g. consumer computers.
Today, MLCommons announced the formation of a new working group, MLPerf Client, whose goal is to create AI benchmarks for desktops, laptops, and workstations running Windows, Linux, and other operating systems. MLCommons promises that benchmarks will be “scenario-driven,” focusing on real end-user use cases and “driven by community feedback.”
To that end, the first MLPerf Client benchmark will focus on text generation models, specifically Meta’s Llama 2, which MLCommons executive director David Kanter notes is already integrated into other MLCommons benchmarking suites for hubs data. Meta also did extensive work on Llama 2 with Qualcomm and Microsoft optimization of Llama 2 for Windows — for the benefit of devices running Windows.
“The time is ripe to bring MLPerf to customer systems as artificial intelligence becomes an expected part of computing everywhere,” Kanter said in a press release. “We look forward to working with our members to bring the excellence of MLPerf to customer systems and promote new features for the wider community.”
Members of the MLPerf Client working group include AMD, Arm, Asus, Dell, Intel, Lenovo, Microsoft, Nvidia and Qualcomm — but notably not Apple.
Apple is also not a member of MLCommons, and a Microsoft engineering director (Yiannis Minadakis) co-chairs the MLPerf Client group — which makes the company’s absence unsurprising. The disappointing result, however, is that whatever AI benchmarks the MLPerf Client comes up with won’t be tested on all Apple devices — at least not anytime soon.
However, this author is curious to see what kind of benchmarks and tools result from the MLPerf Client, macOS supported or not. Assuming GenAI is here to stay—and there’s no indication that the bubble is going to burst anytime soon—I wouldn’t be surprised to see these types of metrics play an increasing role in device purchase decisions.
At my best, MLPerf Client benchmarks are similar to the many PC build comparison tools on the web, giving an indication of the AI performance one can expect from a particular machine. Maybe they’ll expand to cover phones and tablets in the future, even with the involvement of Qualcomm and Arm (both heavily invested in the mobile device ecosystem). It’s clearly early days – but here’s hoping.