Nvidia on Thursday unveiled what it called the world’s most powerful AI supercomputer yet, a giant machine named Perlmutter for NERSC, aka the US National Energy Research Scientific Computing Center.
“Perlmutter’s ability to fuse AI and high performance computing will lead to breakthroughs in a broad range of fields from materials science and quantum physics to climate projections, biological research and more,” Nvidia CEO Jensen Huang gushed.
The $146m super will be built in two stages, though it can be used to a degree right now.
The first phase involves engineers from HPE assembling the infrastructure to house the machine and place 1,536 compute nodes, each containing four NVLink-3-connected Nvidia A100 Tensor Core GPUs and one AMD Milan Epyc processor. That's a total of 6,159 of Nvidia’s latest A100 GPUs and 1,536 AMD server chips, making it capable of reaching four exaFLOPs of AI performance at FP16 precision, we're told.
The second phase will see the machine kitted out with more CPU cores later this year. Another 3,072 compute nodes will be added, these will have two AMD Milan processors and pack 512 GB of memory per node. Dion Harris, Nvidia’s global HPC and AI product marketing lead, told The Register that, after Perlmutter is complete, it expects the machine will probably rank somewhere within the top five supercomputers in the Top 500 list. Supers on this list are ranked according to their performance at FP64 precision.
Perlmutter will be operated at the Lawrence Berkeley National Laboratory. It’s named after Saul Perlmutter, a physicist working at the lab and the University of California, Berkeley, who won the Nobel Prize in 2011 for uncovering evidence that the universe was expanding faster than expected.
- Singapore goes Cray-cray in the best way, picks HPE for new 10 PFLOPS super 'puter
- You're V1 for me, says Arm: Chip biz's 'highest-performance core' takes aim at supercomputers, AI, anything relying on vector math
- Biden administration effectively slaps bans on seven Chinese supercomputer companies for military links
- We need a 20MW 20,000-GPU-strong machine-learning supercomputer to build EU's planned digital twin of Earth
One of the supercomputer’s main projects will be to build on the physicist’s research by constructing the largest known three-dimensional simulation of the universe to date. Researchers will funnel images snapped by the Dark Energy Spectroscopic Instrument, a device built onto the four-metre Nicholas Mayall Telescope at the Kitt Peak National Observatory that will capture light from some 30 million galaxies.
Cosmologists can use the telescope's images to calculate the distances between these objects to uncover the effect of dark energy on the expansion of the universe. The rate of expansion in relation to the Hubble constant is a hotly debated topic as scientists continue to disagree on its value.
Perlmutter will process the dark-energy instrument's images and help researchers orient the telescope to snap new regions. The sensor is expected to gather up to 150,000 data points every night; manually inspecting the light spectrum from each of the galaxies is an impossible task, hence the need for the supercomputer to automate it. Lawrence National Berkeley Lab scientists hope that by using Permutter, they will be able to home in on parts of the data to draw conclusions more quickly – in a matter of days compared to weeks or months.
Rollin Thomas, a data architect at NERSC working to accelerate the team’s software on the system, said the GPUs will accelerate the number-crunching process. “I’m really happy with the 20x speedups we’ve gotten on GPUs in our preparatory work,” he said.
The supercomputer supports OpenMP and Nvidia’s HPC SDK, a suite of compilers and software libraries designed to accelerate scientific computing written in C++ and Fortran on GPUs. Rapids, another framework from Nvidia that works with the computer, is aimed at data science applications in Python.
"The Perlmutter system will play a key role in advancing scientific research in the US and is front and center in a number of critical technologies, including advanced computing, artificial intelligence, and data science," a spokesperson from the Lawrence Berkeley National Laboratory, told The Register.
"The system will also be heavily used in studies of the climate and the environment, clean energy technologies, semiconductors and microelectronics, and quantum information science."
Perlmutter will become NERSC's flagship supercomputer, and supersede the 30-petaFLOPS Cori system installed in 2016. Cori will be wound down and retired eventually. ®