Nvidia's future in scientific computing hinges on a melding of AI and HPC
But if they can't, AMD is well positioned to mop up
Analysis Nvidia had quite the showing at the International Supercomputing show in Hamburg last week. Its GH200 claimed a spot among the 10 most powerful publicly known supercomputers, while the CPU-GPU frankenchips dominated the Green500 for the efficiency prize.
But Nvidia's gains in HPC may be short-lived if its next-gen Blackwell accelerators are anything to go off of.
Unveiled at GTC, the parts are as speedy as they are hot, both in terms of demand and temperature. Its GB200 superchips are capable of churning out 40 petaFLOPS of 4-bit precision peak performance while sucking 2700W of power. Small wonder the chip requires liquid cooling.
The part was compelling enough that Amazon outright ditched Nvidia's first-gen superchips in favor of the Blackwell variant to power its upcoming Ceiba AI supercomputer. But, while Nvidia's Blackwell GPUs are the star in AI circles, it doesn't look nearly as good on paper for more traditional double-precision (FP64) HPC workloads.
Dialed up to FP64, Nvidia's GB200 superchip can only manage about 90 teraFLOPS, or about 45 teraFLOPS per GPU. That makes Blackwell about 32 percent slower than Hopper at crunching FP64 matrix math. Nvidia assures us that despite initially being left off the spec sheet, the chip does in fact support FP64 vector math. And, at 45 teraFLOPS, it's about 32 percent faster than Hopper. The takeaway is Blackwell's double-precision performance is a bit of a mixed bag compared to last gen.
None of this changes the fact that Blackwell can't hold a candle to AMD's MI300 APUs and GPUs in highly precise workloads. Launched back in December, the parts are anywhere from 2.7x to 3.6x faster at double precision than Nvidia's Blackwell GPUs, all while consuming a fraction of the power.
This tells us that Blackwell-based systems will need to be much larger than an equivalent MI300 system if they want to compete on the Top500's flagship High Performance Linpack (HPL) benchmark.
Having said that, the writing is on the wall: Blackwell clearly isn't designed with double precision in mind. FP64 performance just isn't where the money is. Supercomputing clusters are named as such, well, because they're big. Compared to the GPU clusters used to train AI models, all but the largest supercomputers look tiny by comparison.
Nvidia hasn't forgotten its roots, but HPC workloads may be changing
Still, just because Blackwell isn't an FP64 monster doesn't mean that Nvidia is conceding the HPC market to AMD.
"We very much care about scientific computing," Dion Harris, director of Nvidia's accelerated datacenter group, told The Register. "When we have discussions internally, we're always reminded that many of our biggest innovations came from developers out of our scientific computing community."
Harris believes that, to address some of the largest and most challenging scientific quandaries, we can't just brute force the problem with double-precision grunt anymore. That's not to say FP64 performance is overrated.
"FP64 is important, and it's useful, but we think it's just one of the tools that you're going to need to go and tackle a lot of these grand-scale challenges," Harris said.
As the HPC community is fond of saying, high performance computing is a class of workload and it doesn't automatically mean FP64, especially these days. There's a reason there's a mixed-precision benchmark on the Top500 ranking; HPL just isn't representative of every workload.
While some simulations do require as many bits of floating-point precision as the silicon can muster, not all do. In fact, some classic HPC workloads, like meteorological forecasting, have been shown to be quite effective when running at single and even half precision.
The European Center for Medium Range Weather Forecasts and the University of Bristol have been exploring the concept of lower precision HPC for years now.
And then, of course, there's the concept of melding low-precision AI with high-precision simulation to reduce the computational load of data-intensive workloads.
For example, you could simulate a complex or fleeting phenomenon at high precision and then use the data generated to train a model on expected behavior. This model could then be used to quickly process mountains of data at low precision for the most promising data points.
Now, not every HPC workload is going to translate to this approach and certainly not without considerable effort. Having said that, Harris notes a few workloads which are showing promise, including material science and even the kinds of industrial HPC applications championed by folks like Cadence and Ansys.
- Green500 shows Nvidia's Grace-Hopper superchip is a power-efficiency beast
- Aurora breaks the exaFLOPS barrier but falls short of the final Frontier once again
- IBM quantum system elbows into Arm-powered Fugaku supercomputer
- Aleph Alpha enlists Cerebras waferscale supers to train AI for German military
A software problem
Nvidia's success today is rooted in the lessons learned in the HPC community. It's easy to forget that Nvidia didn't just become an AI infrastructure giant overnight. Not that long ago, its primary focus was designing graphics cards capable of pushing more pixels faster across your screen.
Nvidia's rise in the datacenter is thanks, in no small part, to the hard-fought lessons learned from taking those cards and attempting to get applications running on them at scale.
In late 2012, Nvidia's K20 GPUs, 18,688 in total, propelled Oak Ridge National Laboratory's Titan supercomputer to the number one spot on the Top500. As our sibling site The Next Platform has previously discussed, getting here was a long and winding road.
At the time, GPUs were still a very new concept in the supercomputing arena and a lot of the code out there wasn't optimized for GPU acceleration. A considerable amount of effort was invested by Nvidia and its partners to overcome these hurdles and uncover optimizations.
And according to Harris, the same is true of mixed-precision simulation and the infusion of AI into HPC workloads.
Going forward, Nvidia's priority is addressing the broadest range of problems possible with its accelerators spanning the gamut from the fuzzy math that fuels AI to the highly precise floating-point math on which simulation has traditionally relied. As for the company's ongoing relevance in scientific computing circles, it seems that will hinge entirely on how quickly it can foster adoption of software paradigms that make melding AI and HPC viable. ®