Researchers weigh new benchmarks for Green500 amid shifting workload priorities
Just because it's super efficient at Linpack doesn't mean it'll be in everything
SC23 Is it time for the Green500 to expand its scope to account for more diverse workloads? This was one of the questions attendees grappled with at SC23.
Similar to the Top500, which ranks systems based on sheer performance, the Green500 weighs that performance against a system's power consumption in terms of gigaFLOPS per watt.
High Performance Linpack has been the gold standard for testing compute clusters for performance in exacting double-precision workloads. So, when the Green500 was launched in 2007 it made sense to use Linpack as the basis for evaluating the efficiency of these systems.
The problem is Linpack is only one benchmark and it isn't representative of all workloads. This is why we've seen benchmarks like High Performance Conjugate Gradient (HPCG) and HPL-MxP — formerly HPL-AI — crop up over the years to provide additional context for both traditional double-precision and mixed precision workloads.
But while we've found new ways to benchmark supercomputers in terms of performance, the Green500 remains tied to Linpack. That, however, may not be the case for much longer.
Trying an alternative approach
Over the past 18-24 months there has been a growing movement to broaden the scope of the Green500 to alternative workloads, Wu-chun Feng explained during a presentation at SC23.
"Mike Heroux and Jack Dongarra in particular have broached this subject about looking at the Green500 using HPCG," he explained. "Satoshi Matsuoka has been talking about 'all these benchmarks are important can we come up with some type of composite Green500 number of FLOPS per watt by somehow combining the numbers we get from the different benchmarks'."
Some of the early testing of HPCG is designed to more closely reflect real world performance in a wide variety of HPC workloads. If you take a look at HPCG performance, the scores are substantially lower than you'd expect to see from Linpack. In fact Japan's Fugaku comes in fourth in Linpack but first in the HPCG benchmark, beating out Frontier.
It's important to remember that for the Green500, the workload - whether its Linpack or HPCG - is there just as much to measure power consumption as it is to measure performance. Different benchmarks are going to utilize the infrastructure like accelerators and network fabrics to different degrees. As such, testing methodologies may need to be adjusted to accommodate alternative workloads.
While complex, Feng noted that "HPCG presents an opportunity to innovate from a software perspective in order to deliver energy efficiency."
While Feng didn't touch on HPL-MxP in much detail, there also appears to be an opportunity to address workloads that can take advantage of lower-precision floating point calculations to achieve a speedup compared to your typical FP64 application.
Looking at modern accelerators, it's not hard to see why. Nvidia's H100, for instance, sports up to 67 teraFLOPS of FP64, but drop down to FP8 and you're looking at 2 PFLOPS and roughly 4 PFLOPS with sparsity enabled.
Scientists at the University of Bristol have demonstrated the advantages of running climate models at half precision. But, the biggest beneficiary of lower precision is undoubtedly AI training and inference, especially for models that take advantage of sparsity.
As such, it's not hard to imagine a system that's incredibly efficient in mixed-precision workloads but performs rather poorly in HPC benchmarks. But just like HPCG, incorporating HPL-MxP into the Green500 ranking will likely require new testing methodology.
- Aurora dawns late: Half-baked entry secures second in supercomputer stakes
- Intel drops the deets on UK's Dawn AI supercomputer
- HPE and Nvidia offer 'turnkey' supercomputer for AI training
- As the Top500 celebrates its 30th year, with a $5 VM you too can get into the top 10 ... of 1993
Henri maintains its lead over Green500
Despite the excitement surrounding Aurora's arrival on the Top500 ranking of supercomputers, there weren't nearly as many surprises with regard to this fall's Green500.
The Flatiron Institute's two petaFLOP Henri system retained its top spot. The 31-kilowatt Lenovo ThinkSystem cluster managed to squeeze 65 gigaFLOPS per watt from its 5920 Nvidia H100 and Ice Lake Xeon cores.
With that said two systems have moved into the top 10 most efficient supers. This included EuroHPC's MareNostrum 5 ACC which in addition to claiming the number eight spot on the Top500 managed to displace frontier for sixth place on the Green500.
Built by Eviden, the system features a similar arrangement as Henri, pairing Nvidia's H100s with Intel's newer 4th-Gen Xeon Scalable processors. The system managed to achieve 54 gigaFLOPS per watt of efficiency in the test.
South Korea's Olaf system was the other new system to break into the upper echelon of the Green500, claiming the number ten spot at 45 gigaFLOPS per watt.
Olaf is another Lenovo ThinkSystem machine, but instead of Intel's CPUs it pairs Nvidia H100 GPUs with AMD's 32 core Eypc Genoa processors. ®