Aurora dawns late: Half-baked entry secures second in supercomputer stakes
Half the machine, quadruple the anticipation for all-Intel super
SC23 After years of delays, Argonne National Laboratory's all-Intel "Aurora" supercomputer has finally graced the Top500 ranking of the world's most powerful publicly known supercomputers — just not where many had hoped to see it.
The system, which features Intel's high-bandwidth memory (HBM)-equipped Xeon Max processors and GPU Max accelerators managed 585 petaFLOPS of double precision performance in the Linpack benchmark — or at least half of it did. Argonne, which completed installation of Aurora in late June has only submitted Linpack results for about half the system. The full system is expected to exceed two exaFLOPS of peak performance.
"Typically when you deploy systems like Aurora [with] 60,000 GPUs, it takes about seven to nine months to get to complete stability and correct attuning of a system," Ogi Brkic, the VP of Intel's supercomputing group, told journos during a pre-briefing. "We completed the bill of a system in June; in a full month we were able to do a lot."
SC23 at a glance
- Next Platform: Analysis of the latest Top500 global supercomputers
- Next Platform: Nvidia announces H200 Hopper AI accelerator with more HBM
- Next Platform: Will Isambard 4 be UK's first true exascale machine?
- The Register: Intel details UK's Dawn AI supercomputer
- The Register: HPE, Nvidia offer 'turnkey' supercomputer for AI training
- The Register: Fujitsu says it can optimize CPU and GPU use to minimize execution time
- The Register: A $5 VM can get you into the top 10 supercomputers ... of 1993
The system's arrival at Argonne this summer and the Top500 this fall comes after years of delays and redesigns. The machine was supposed to come online in 2021, but has been delayed repeatedly by Intel's challenges bringing chips to market.
At one point the system was slated to deliver 180 petaFLOPS of double precision performance using 50,000 "Knights Hill" Xeon Phi many-core CPUs, but later pivoted to a more traditional CPU-plus-GPU design.
But as our sibling site The Next Platform has pointed out on more than one occasion, while it may not be the most efficient system — to produce 585 petaFLOPS, the system required 24.6 megawatts — it'll be one of the cheapest exascale-class supercomputers ever, coming in at $200 million. That's, of course, after Intel Federal, the prime contractor on the project, took a $300 million write off on the scheme.
Despite the partial showing, Aurora has managed to claim the number two spot, ousting Japan's 442 petaFLOPS Fujitsu A64-based "Fugaku" supercomputer that previously held that position.
All of this means that Oak Ridge National Laboratory's 1.2 exaFLOPS Frontier system has retained the top spot in the biannual ranking for a fourth consecutive time. The system arrived at the summit of the Top500 in early 2022 and is powered by AMD's 64-core Eypc 3 silicon and Instinct MI250X accelerators.
Unfortunately, those hoping to see how well Intel's complete system performs will have to wait until at least next June's Top500 ranking. And while we wait, Intel and Argonne will need to more than double the system's performance, if it has any hope of beating Frontier, or competing with Lawrence Livermore National Laboratory's AMD MI300A-powered "El Capitan" system.
Top500 gets a shakeup
Aurora isn't the only new supercomputer contending for a place at the top of the heap on this fall's ranking. In fact, the Top500 received quite a shakeup compared to the last few years of relative computational stability.
Of the ten fastest supercomputers in the Top500 ranking, four - including Aurora, Microsoft Azure's "Eagle" system and EuroHPC's "MareNostrum 5 ACC", and Nvidia's "EOS" system - are new. or have been upgraded significantly since this spring.
Next to Aurora, Eagle is the most powerful of these new systems. The cloud-based super claimed the number three spot with an Linpack score of 561 petaFLOPS which it squeezed from its 56-core Xeon 8480C processors and Nvidia H100 GPUs. In fact, Eagle is the highest ranked cloud system in the history of the Top500 and the fastest H100-equipped system to compete for a spot in the top 10.
With that said, Eagle isn't the first cloud cluster to breach the top ten list. Microsoft's Voyager-EUS2 claimed the number ten spot two years ago. However, with just 30 petaFLOPs of FP64 grunt, that's nearly 19x slower than Eagle.
Given just how many H100s Microsoft has been deploying to power its AI search and enterprise products, Redmond's return to the upper echelons of the Top500 are hardly surprising.
- Developing AI models or giant GPU clusters? Uncle Sam would like a word
- HPE and Nvidia offer 'turnkey' supercomputer for AI training
- Fujitsu says it can optimize CPU and GPU use to minimize execution time
- Ventana bumps performance on Veyron RISC-V silicon to surely speed up servers
The team working on the AMD-powered "LUMI" system at CSC in Finland has upgraded kit on several occasions and has consistently managed to extract out double-digit petaFLOP performance improvements each year since its arrival on the Top500 last spring. The system is now 150 percent faster than it was first deployed, it's reported.
Looking down the stack EuroHPC's MareNostrum 5 ACC and Nvidia's EOS supers now slot in between the IBM-Nvidia powered Summit and Sierra systems with 138 petaFLOPS and 121 petaFLOPS respectively. These share a lot in common with Microsoft's Eagle platform, as they're using a combination of 4th-gen Intel Xeons, Nvidia's H100 accelerators, and Infiniband networking.
This year's ranking also represents a bit of a role reversal for Intel and AMD. Previously Intel CPUs were used in just two of the top 10 systems. Now Intel processors are used in five, while AMD processors power Frontier and Lumi; Fugaku uses custom Arm cores; and IBM's Power 9 chips still underpin Summit and Sierra machines.
More disruption to come:
With the race to build ever larger GPU clusters to power public and private AI development, next year's Top500 rankings look likely to be headed toward another shakeup.
In addition to Aurora, there are several new exascale and pre-exascale class systems slated to come online in the US and Europe over the next year. Two of the most anticipated are the El Capitan system we mentioned earlier and Europe's Jupiter system.
El Capitan was one of the first systems to showcase AMD's Instinct MI300A APU. The chip combines 24 of AMD's Zen 4 cores — the same ones used in its Genoa Epycs down to the dies themselves — and six CDNA 3 GPU dies and 128GB of HBM3 memory. The Next Platform predicts the system will deliver peak theoretical performance of 2.3 exaflops peak FP64 when fully operational.
Europe's first exascale super will also begin installation in 2024, though it's not year clear whether the system will be finished and fine tuned in time to rank at ISC or SC24. That system will be built by Atos and powered by SiPearl's Arm-based Rhea processors and Nvidia's GH200 Superchip. ®