Nvidia PUE-PUEs datacenter efficiency ratings, calls for application-specific metrics

What good is great power use effectiveness if your DC is packed with inefficient kit?

ISC Power use effectiveness – PUE for short – has long been the spec by which datacenter efficiency has been measured. But after nearly two decades Nvidia believes it's time for a new metric.

PUE itself is a rather simple measure of efficiency. It describes how much power consumed by datacenters goes toward compute, storage, or networking equipment – stuff that makes money – versus things like facility cooling, that don't. The closer the PUE is to 1.0, the more efficient the facility.

For example, a facility located in an arid climate may have a higher – and therefore worse – PUE because more power is required by air handlers to keep the datacenter cool. Meanwhile, a facility in the Nordics that can take advantage of free cooling year round would therefore have a lower PUE.

But while Nvidia acknowledges that PUE has helped to drive improvements in computing infrastructure and reduce power lost to cooling, it's far from perfect.

"PUE doesn't measure the useful output of a datacenter, only the energy that it consumes. That'd be like measuring the amount of gas an engine uses without noticing how far the car has gone," the AI giant argued in a blog post on Monday.

The argument effectively boils down to: datacenter metrics should take into account how effectively a facility is able to convert watts into work rather than how much of that power is going toward compute versus cooling and power conversion. Or, put another way, nobody cares how good your PUE is if your datacenter is filled with inefficient servers that are terrible at turning watts in to FLOPS.

Nvidia isn't alone in this line of thinking either. The brainchild behind the metric believes PUE has run its course. "It improved datacenter efficiency when things were bad. But two decades later, they're better, and we need to focus on other metrics more relevant to today's problems," datacenter engineer Christian Belady – who recently joined liquid cooling vendor Iceotope as an advisor – was quoted as saying.

When it comes to measuring work, we already have a couple of options readily available to us: millions of instructions per second (MIPS) and FLOPS being the two most common. For instance, the Green500 uses FLOPS/watt as measured during the High-Performance Linpack benchmark in its ranking of the world's most efficient supercomputers of public record.

While useful, Nvidia believes that the benchmark in question needs to be application specific. In other words, FLOPS/watt in double precision floating point isn't exactly representative of the performance datacenter operators can expect when running AI inference at INT8 or FP4 at scale.

Going off of Nvidia's early analogy, it'd be a bit like comparing MPGs in the city to MPGs on the highway. Both are useful metrics, but only in the appropriate context.

"The holy grail is a performance metric. You can't compare different workloads directly, but if you segment by workloads, I think there is a better likelihood for success," Belady argued in the post.

As such, Nvidia contends that using something like MLPerf rather than Linpack to arrive at a performance metric like tokens per joule or watt hour might make more sense for those running AI workloads.

It's worth noting that Nvidia's latest generation of datacenter GPUs are heavily skewed toward lower precision data types, as opposed to double precision workloads commonly seen in the scientific computing field. So naturally the GPU giant is going to want to prioritize benchmarks that are not only relevant, but cast its parts in a positive light. Of course, its Grace Hopper Superchips are dominating the Green500 right now, so it's not like they aren't effective at double precision.

In addition to these metrics being application specific, Nvidia makes the case that they need to evolve over time so they don't become irrelevant.

Nvidia isn't the first to suggest new benchmarks for measuring the power efficiency of compute clusters. Last November, HPC experts grappled with whether it was time for the Green500 to expand its scope, to account for more diverse workloads.

While the Green500 got its start with the Linpack benchmark, the Top500 has since added additional benchmarks like the High Performance Conjugate Gradient (HPCG) and HPL-MxP – what used to be called HPC-AI – to provide additional context in both double and mixed precision workloads. If and when such a change might happen remains to be seen.

Nvidia's call for new datacenter efficiency metrics comes as public awareness of AI's energy consumption reaches fresh highs. Datacenter power consumption is currently forecast to double by 2026 and industry experts have warned that unchecked datacenter growth will strain the grid unless modernization steps are taken now.

Uranium magnates are even talking up the AI boom as a boon for the nuclear energy industry, while cloud providers like Microsoft are hedging their bets on small modular reactors. ®

More about


Send us news

Other stories you might like