Nvidia continues its quest to shoehorn AI into everything, including HPC
GPU giant contends that a little fuzzy math can speed up fluid dynamics, drug discovery
SC24 Nvidia on Monday unveiled several new tools and frameworks for augmenting real-time fluid dynamics simulations, computational chemistry, weather forecasting, and drug development with everyone's favorite buzzword: AI.
The announcements underscore an ongoing effort by Nvidia to not only accelerate HPC workloads that are traditionally run on CPUs with its GPUs, but to reduce the time and energy required to complete them using machine learning anywhere and everywhere possible.
According to Dion Harris, who heads up datacenter product marketing at Nvidia, the performance gains from even a little fuzzy math can be substantial.
In the case of computational chemistry, Nvidia says it was able to calculate 16 million structures 100x faster using its AI-accelerated Alchemi containers or NIMs compared to running the workload on GPUs without AI acceleration.
We've discussed NIMs at length in the past, but in a nutshell, Nvidia inference microservices are container images with all the frameworks, libraries, and dependencies necessary to achieve a desired goal. We bring this up because NIMs are quickly becoming Nvidia's preferred way of packaging its software products.
Other NIMs announced at SC24 include containers for its Earth-2 for CorrDiff and FourCastNet weather models and DiffDock 2.0 for protein simulations.
In another example, Harris pointed to Nvidia's Omniverse blueprints for computer-aided engineering, which use multiple AI models to achieve real-time simulations of things like computational fluid dynamics.
"Normally, this kind of simulation would take weeks or even months just for a single car," Harris claimed.
The efficiency gains are large enough, and any loss of resolution minor enough, that Nvidia has convinced HPC software giant Ansys to integrate the frameworks into its own fluid simulation platform.
"Altair, Cadence, Siemens, and others are exploring how to integrate these blueprints into their own services and products for design acceleration," Harris said.
Of course, the use of mixed precision and AI in HPC to solve bigger, more complex problems using less compute is nothing new. Researchers working on climate models have seen some of the most promising gains from these approaches. However, shifting the broader HPC community toward this way of thinking is also in Nvidia's best interests.
AI is driving massive revenues in Nvidia's datacenter business, which in turn is reflected in the company's design decisions. Blackwell is a prime example. When it comes to double precision grunt, its latest generation of GPUs and Superchips are a mixed bag. On the one hand, FP64 vector performance is up at 45 teraFLOPS, but for matrix math, the chip represents a bit of a regression from the H100 and H200.
This has put Nvidia at a bit of a disadvantage to AMD, which not only makes CPUs for the stubbornest of HPC apps that refuse to transition, but its GPUs and APUs promise significantly higher performance. The MI325Xis arguably the most comparable part to Nvidia's Blackwell generation, with 81 teraFLOPS vector and 163 teraFLOPS matrix performance at double precision.
At the other end of the spectrum, Nvidia's FLOPS count has exploded as it has traded precision for sheer performance, with its top-specced Blackwell GPUs each touting 20 petaFLOPS at FP4.
Nvidia's HPC strategy, it seems, is, rather than compete with AMD in a much smaller market, it might convince software vendors that its mix of fuzzy matrix math and modest double precision performance is actually better given the right implementation.
This is by no means a new course for Nvidia. The company's contributions to the HPC community following the debut of CUDA in 2007 were influential in the rise of GPUs not just in supercomputing, but the enterprise and cloud at large.
Breaking into the HPC space at a time when it was dominated by CPU-based architectures required building new frameworks and adapting software to run on GPUs.
- Nvidia's MLPerf submission shows B200 offers up to 2.2x training performance of H100
- Ambitious overclocker cools Raspberry Pi 5 with liquid nitrogen
- Google Gemini tells grad student to 'please die' while helping with his homework
- Datacenters line up for 750MW of Oklo's nuclear-waste-powered small reactors
In many ways Nvidia's strategy hasn't actually changed, but has become more creative in its application of that software, and where it makes sense that means leaning on machine learning. And where it doesn't make sense, yet, Nvidia has stuck with adapting existing frameworks to accelerated compute.
The latest example of this being cuPyNumeric, a "drop-in replacement" for the ubiquitous NumPy library.
"NumPy is the foundation library for math model computing for Python developers. It's used by over five million scientific industrial developers, with 300 million downloads last month alone," Harris said, adding that, despite its ubiquity, it can be challenging to scale the library across a multi-GPU cluster.
Nvidia claims cuPyNumeric allows NumPy programs to automatically scale across larger clusters without having to resort to low-level distributed computing libraries.
Nvidia this week also extended support for accelerated dynamic simulations in its CUDA-Q platform for quantum systems. "GPU-accelerating these comprehensive qubit simulations allows researchers to test new quantum processor designs," Harris said. "To simulate 50 design iterations would have previously taken about a year. Now, you can run it in less than an hour."
Google is among the first to put CUDA-Q to work running large-scale quantum simulations on Nvidia's EOS Supercomputer. ®