This article is more than 1 year old

Los Alamos finishes installing Crossroads super to test nukes without a big bang

Memory-optimized beast prioritizes weapon-sim perf over flashy FLOPS figs

After months of work unpacking, installing, and deploying the various subsystems and supporting infrastructure, Los Alamos National Laboratory's (LANL) latest super, the Crossroads system, has been installed.

This big beast is tasked with one of the US Department of Energy's (DoE) most secretive workloads: making sure America's nuclear stockpile actually works should the dark day come that it's ever needed.

The US can't exactly go out and check at its nuclear arsenal still works by setting one of those warheads off, so the DoE uses supercomputers to simulate the storage, maintenance, and efficacy of the weapons instead. (It does carry out some sub-critical physical experiments, but simulations are needed.)

"Crossroads represents a significant advance in the nation's ability to assess the safety of the stockpile, as well as modernizing the deterrent to meet a new national security landscape," Charlie Nakhleh, associate lab director for Weapons Physics at Los Alamos, said in a write-up.

Crossroads is LANL's latest system to inherit this responsibility, taking over from the aging Trinity system. The latest system was developed by HPE's Cray division, but unlike ORNL's 1.1 exaFLOPS Frontier system or Argonne's newly installed Aurora, Crossroads isn't GPU accelerated.

(Crossroads as a name is an apt choice: Operation Crossroads was the code-name given to two atomic bomb tests by America in 1946, the first such tests after the famous Trinity detonation in 1945, depicted in this summer's smash-hit movie Oppenheimer.)

Considering most of the FLOPS delivered by modern supercomputers come from GPUs, not CPUs, this might seem like a strange omission for Crossroads. However, as researchers at LANL noted, different workloads have different bottlenecks, and when it comes to simulating nuclear weapons, memory is a big one.

"Given the hoopla in the press around the 'fastest supercomputer in the world,' one might think we should buy computers with the most FLOPS," Gary Grider, who heads up LANL's HPC division, said.

"Every class of problem requires a different balance of FLOPS, memory size, and memory access. For the problems we are working on, the time it takes to get a result is mainly determined by memory size and memory access, not FLOPS."

So, rather than packing the system full of graphics processors, Crossroads was optimized around memory. As our sibling outlet The Next Platform discussed in its preview of the Crossroads kit last year, while LANL hasn't revealed much in terms of specifics about the system, it reportedly uses a combination of Intel's Sapphire Rapids Xeon Scalable and high-bandwidth memory Xeon Max processors.

The lab appeared to confirm the latter in its announcement this week, noting that high-bandwidth memory "brings memory directly to the processing chip." Intel's Xeon Max processors pack up to 64GB of on-package HBM2e memory and 56 CPU cores into a single package capable of 1TB/s of memory bandwidth. For reference, that's more than twice the bandwidth AMD's 12-channel Epyc 4 CPUs can deliver when using standard DDR5 modules.

According to LANL, Crossroads should deliver roughly 4x-8x the performance of Trinity. That system, introduced in 2015, is capable of 41.46 petaFLOPS of peak FP64 performance, though in reality the system only managed about half that in the LINPACK benchmark.

"It hardly ever happens in computing that you can move to a new system and see huge gains without changing the codes," Grider said in the write-up. "But the switch from Trinity to Crossroads will do just that."

Crossroads itself is supported by three smaller systems, Rocinante, Razorback, and Tycho, which were named after ships and stations from The Expanse science fiction novels and television series.

According to LANL, Rocinante is an application regression system, a miniaturized version of the full Crossroads system which can be used to test code in an unclassified environment. Razorback serves a similar role, but rather than catering to developers and scientists, the system provides a testing environment for preparing, testing, and deploying patches and updates in a controlled environment before pushing them out to the full system.

Finally Tycho, which was among the first Crossroads systems delivered last year, is based on a similar architecture as the full system, but instead of HBM uses conventional memory. According to LANL this provides compute cycles to "stockpile simulation users who otherwise might have been waiting for Crossroads."

With the system is now installed, lab crews and techs are working to run initial diagnostics on the full system. LANL expects to open the machine to National Nuclear Security Administration labs, which also includes Sandia and Lawrence Livermore National Labs, this fall. ®

More about

TIP US OFF

Send us news


Other stories you might like