It's time to stop fearing CPU power management
It's (probably) not going to kill your latency and it could save you a buck
Feature Over the past few years, we’ve seen the thermal design power (TDP) of all manner of chips creeping steadily higher as chipmakers fight to keep Moore's Law alive, while continuing to deliver higher core counts, faster frequencies, and instructions per clock (IPC) improvements on schedule.
Over the span of five years, we’ve seen chipmakers push CPUs from 150-200W to as much as 400W in the case of AMD’s 4th-gen Epyc. And during that same period, we’ve seen the rapid rise of accelerated compute architectures that employ GPUs and other AI accelerators.
Following this trend, it’s not hard to imagine per-socket power consumption in excess of 1kW within the next year or two, especially as AMD, Intel, and Nvidia work to finalize their Accelerated Processing Unit architectures (APUs) and meld datacenter GPU with CPU.
The idea of a 1KW part might seem shocking, and it’ll almost certainly require direct liquid cooling or perhaps even immersion cooling. Yet higher TDPs aren’t inherently bad if the performance per watt is higher and it scales linearly.
Just because it can do doesn’t mean it should do
But just because a CPU can burn 400W under load, doesn’t mean it needs to. While Intel and AMD both boosted the TDP of their fourth-gen parts, they also introduced a number of enhancements to power management that make it easier for customers to prioritize sheer performance or optimize for efficiency.
“We have a couple of mechanisms in our [Epyc 4] part, like Power Determinism and Performance Determinism,” explained Robert Hormuth, VP of AMD’s datacenter solutions group, in an interview with The Register. “Depending on what customers want in their datacenter behavior, we give them some knobs that they can use to do that.”
IT operators are risk averse. The general reaction to power management is for the small savings we get in energy and costs, the risk in terms of our service level agreements with our customers is too high
In a nutshell, Epyc 4 can either be tuned to prioritize consistent performance stability or tweaked to ensure consistent power consumption by modulating the clock speeds as more or less cores are loaded.
Intel, meanwhile, has introduced an “Optimized Power Mode” to its Sapphire Rapids Xeon Scalable processors, which the company claims can reduce per-socket power consumption by as much as 20 percent, in exchange for a performance hit of roughly 5 percent.
According to Intel Fellow Mohan Kumar, the power management feature is particularly effective in scenarios where the CPUs are only running at 30-40 percent utilization. With Optimized Power Mode enabled, he says customers can expect to see a 140W reduction in power consumption on a dual socket system.
Of course, CPU-level power management doesn’t exactly have the best track record.
“IT operators are risk averse. The general reaction to power management is for the small savings we get in energy and costs, the risk in terms of our service level agreements with our customers is too high. And so, there’s a hesitancy to go into using power management,” Uptime Institute analyst Jay Dietrich told The Register. “There’s usually an urban legend associated with those beliefs that involves an SLA disaster three technology generations in the past.”
- Intel, AMD just created a headache for datacenters
- AMD flips the bird at Intel as it glides past in CPU-GPU stakes
- Swedish datacenter operator wants to go nuclear
- Unless things change, first zettaflop systems will need nuclear power, AMD's Su says
The result is that IT managers end up leaving power management functions off as a general rule — even when many systems don’t have strict latency requirements.
It’s true many power management features can introduce undesirable levels of latency, but that is not necessarily a problem for every workload. Intel's Kumar argues that Sapphire Rapids’ Optimized Power Mode features should be something most customers can use without concern, except in the case of latency sensitive workloads. Kumar says customers who run such apps should evaluate the features to determine whether the CPU can deliver acceptable performance and latency with it turned on.
According to Uptime Institute's Jay Dietrich, it’s really about time this became the default procedure for IT buyers when procuring new equipment. “What we encourage is, just like you’re going to work with your vendors and make choices based on performance characteristics — like latency — you should also test power management and make a determination ‘Can I use power management for these workloads?’” he said.
Many CPUs now support per core C-states, Dietrich adds. “So, if I’m running at 50 percent utilization and you only need half, the chip will actually shut down half the cores you don’t need, and that’s a significant savings.”
Setting aside the environmental implications of the human race’s compute demands, there are more practical reasons why these power management features are worth exploring.
For one, most existing datacenters aren’t designed to accommodate systems as powerful and compute-dense as those available today.
During its Epyc 4 launch event last year, AMD painted a picture in which customers could consolidate multiple racks into just one. While the prospect of packing two or three racks of aging systems into a single cabinet full of Epycs or Xeons — the logic isn’t unique to AMD — is rather attractive, most datacenters simply aren’t equipped to power or cool the resulting rig.
“It’s a very real challenge for the industry,” Dietrich says. “Historically what's been done in an enterprise situation is you start breaking down your racks. When I refresh with a higher energy footprint, I put 15 servers in the rack or 10 servers in the rack instead of 20.”
As CPUs and GPUs grow ever more power hungry, the number of systems you can fit into a typical six-kilowatt rack drops sharply.
Factoring in RAM, storage, networking and cooling, it’s not hard to imagine a 2U, two socket Epyc 4 platform consuming well in excess of a kilowatt. This means it’d only take five or six nodes — six to 12 rack units — before you’ve used up your rack power budget. Even assuming that not all those systems will be fully loaded at the same time — they probably won’t be — and you overprovision the rack, you’re still going to run out of power before the rack is half full. And that’s just looking at general compute nodes. GPU nodes are even more power hungry.
Of course, datacenter operators aren’t blind to this reality, and many are taking steps to upgrade their new and existing infrastructure to support hotter, more power-dense systems. But it is going to take time for datacenter operators to adjust and may even drive adoption of these power management features to reduce operating costs. ®