AMD, Nvidia, HPE tapped to triple the speed of US weather super with $35m upgrade

Scientists will focus on modelling hurricanes, wildfires, solar storms with Milan Epycs and Nv A100s

HPE will upgrade the US National Center for Atmospheric Research's (NCAR) supercomputer using AMD and Nvidia's latest CPUs and GPUs, creating a machine roughly three times as powerful as its current Intel-based beast.

That system running today, code-named Cheyenne, is four years old, and now its government-funded lab wants a bigger and faster super to forecast natural disasters unfolding on Earth. Uncle Sam shopped around and awarded the contract, worth more than $35m, to Hewlett Packard Enterprise.

The upgraded supercomputer has yet to be given an official name, and is expected to be up and running in 2022, when it will replace Cheyenne, we're told. Kids in the lab's home state of Wyoming will be asked to propose a name for the thing.

The computer will be based on HPE's Cray EX (previously Shasta) supercomputer blueprints, and is expected to reach a theoretical maximum performance of 19.87 petaFLOPS – that's much faster than the 5.34 petaFLOPS Cheyenne offers today.

"That is almost 3.5 times the speed of scientific computing performed by the Cheyenne supercomputer, and the equivalent of every man, woman, and child on the planet solving one equation every second for a month," NCAR said Wednesday. "Once operational, the HPE-powered system is expected to rank among the top 25 or so fastest supercomputers in the world."

Once operational, the HPE-powered system is expected to rank among the top 25 or so fastest supercomputers in the world

The new machine will sport 2,570 compute nodes: 2,488 of those will contain AMD's 7nm third-gen Epyc Milan processors – due to officially launch in March – and 82 remaining nodes will be a mixture of Milan chips and Nvidia's 7nm A100 GPUs. It'll have a total RAM capacity of a whopping 692TB. The connectivity between the nodes is powered by HPE's Slingshot interconnect architecture, boasting a bandwidth of 200Gbps per direction per network switch port.

By using GPUs to accelerate workloads, NCAR will be moving away from its previous CPU-only approach. Although Cheyenne has more 4,032 computation nodes, they're filled with Intel's Xeon Broadwell workhorses, and this system design is less efficient, in terms of power and silicon required, than more modern supercomputers that have a mix of CPUs and GPUs to crunch through math operations at high speed. Specialist hardware wins out over generic processors.


Nvidia signs up for an Italian Job: Building for Europe the 'world's fastest AI supercomputer' by 2022


"In terms of hardware, the new system will be hugely helpful to AI and machine learning," David Hosansky, a spokesperson for NCAR, told The Register.

"The big benefit of GPUs is performing large numbers of computations simultaneously on one chip, resulting in much less power usage and hardware for the same number of parallel operations. GPUs have less on-board memory than CPUs, but the ones being used in this new system are top of the line in terms of both memory and number of cores. That will allow our scientists to load more data and train larger machine learning models than they could before."

NCAR scientists will use machine-learning algorithms to simulate models of freak weather events, including hurricanes, hail storms, wildfires, and solar storms. These models ingest huge amounts of weather data to output forecasts, and help scientists understand the impacts of climate change.

For example, AI is used to estimate the amount of moisture in vegetation and that data is fed into another model that combines real data from satellite observations to map out regions at risk of wildfires. Weather changes quickly, and these maps are regenerated every day.

"We are now working on using GOES-R satellite data and machine learning to produce hourly fuel moisture content maps over [the US]," NCAR scientist Branko Kosovic, director of the Weather Systems and Assessment Program, told El Reg. "Machine learning could be used also to develop more accurate and frequent high resolution maps of fuel characteristics. Furthermore, machine learning models could be developed to improve rate of fire spread parameterizations by combining observations and theoretical considerations and developments."

For more conversations with NCAR staff, check out our sister site, The Next Platform. ®

Biting the hand that feeds IT © 1998–2021