Nvidia H100-based Henri supercomputer tests AMD’s claim on Green500
In this race it's all about flops per watts
SC22 There’s a new energy-efficiency king at the top of this fall’s Green500 ranking of the most green supercomputers in the world, and it's a tiny 31-kilowatt cluster powered by Nvidia’s H100 GPUs.
Developed by Lenovo for the Flatiron Institute in New York, the two petaflop Henri system is the first and only system using Nvidia’s Hopper GPU architecture to make it onto this fall’s Supercomputing leaderboards.
The system itself is really more of an HPC cluster than a supercomputer, similar to Frontier or LUMI. Based around Lenovo’s ThinkSystem SR670 V2 server platform, each node pairs two 32-core Ice Lake Xeon Scalable processors with four of Nvidia’s 80GB H100 GPUs. With a total of 5920 cores between the CPU and GPUs, Henri is the second-smallest system on the list.
But unlike the Top500, which prioritizes sheer performance, the Green500 weighs that performance against a system’s power consumption, ranking systems based on how many gigaflops they can squeeze out of each watt.
At just over 65 gigaflops per watt, Henri managed to squeak past Oak Ridge National Laboratory's TDS testbed, the prior efficiency champion. And that's despite the fact Henri is only achieving 37.6 percent of its 5.4 petaflop maximum theoretical performance.
This suggests there’s a substantial amount of grunt being left on the table, and if Flatiron or Lenovo can manage better than linear performance scaling, the system’s efficiency should also improve over time.
Clues to Hopper’s efficiency
Comparing Henri to the next most efficient system also using Intel CPUs and Nvidia GPUs — the Atos THX.A.B cluster — the H100-based system is about 59 percent more efficient.
While it’s difficult to tell how much of that can be attributed to Nvidia’s new Hopper architecture in the H100, it certainly looks promising.
As we’ve seen with systems like Australia's Setonix, GPU-acceleration has an outsized impact on efficiency compared to CPU compute. Setonix’s CPU-only configuration ranked at 338 on this fall’s Green500, while its GPU-accelerated configuration came in at number four.
With that said, a larger system using the same hardware as Henri probably won’t scale linearly. As systems grow larger it’s pretty common for performance overheads to eat into the overall efficiency of the system. For example, the full-sized Frontier supercomputer at ORNL is 74 times bigger than the Frontier TDS system, but is roughly 16 percent less efficient.
- Nvidia reveals specs of latest GPU: The Hopper-based H100
- Nvidia taps Intel's Sapphire Rapids CPU for Hopper-powered DGX H100
- Aurora delays keep Frontier supercomputer in #1 spot on Top500
- Nvidia turns to optical trickery to boost long-haul InfiniBand performance
Another unknown is how Nvidia’s H100s will perform when paired with faster PCIe 5.0-equipped CPUs like Intel’s Sapphire Rapids.
AMD dominates Green500, again
As usual, the Green500 saw a fair bit of reshuffling this fall. But as we saw with this spring’s ranking, HPE’s AMD based systems continued to power the majority of the 10 most efficient systems.
Frontier TDS, Adastra, Setonix - GPU, Dardel - GPU, Frontier, and LUMI — which now hold the second through seventh positions on the Green500 — are all powered by HPE’s Cray EX235a platform which pair AMD’s 64-core Epyc 3 Milan CPUs with the chipmaker’s Instinct MI250x GPUs.
France’s number-10 ranked Champollion system, which is based on HPE’s Apollo server platform, is also using AMD Epyc processors but opts for Nvidia’s A100 GPUs over Instinct.
The only non-AMD systems to make the top 10 were the Atos’ THX.A.B in eighth place and MN-3 in ninth. Both systems are equipped with Intel Xeon Scalable processors.
You can find a full breakdown of the Green500 here.
Change on the horizon
AMD's dominance at the top of the Green500 may not last much longer. Next-gen CPUs and GPUs from the likes of Intel, AMD, and Nvidia will be making their way into systems over the next few months.
This week we learned that the Adastra system would be among the first to deploy AMD’s Epyc 4 Genoa CPUs. The chips promise a 14 percent IPC uplift across 50 percent more cores, but it's hard to say whether that will be enough to maintain AMD’s lead.
Intel's HBM-stacked Xeon Scalable processors and Ponte Vecchio GPUs — now called Xeon Max and Data Center GPU Max — are already making their way to Argonne National Labs for integration into the Aurora supercomputer. Los Alamos National Lab’s (LANL) Crossroads machine will also use Intel’s Xeon Max processors.
Likewise, the first supercomputers powered by Nvidia’s Grace and Grace-Hopper Superchips, including LANL's Venado system is expected to launch sometime next year.
Depending on how these chips perform in HPC applications, the Green500 could look very different come next spring. ®