Los Alamos Lab powers up Nvidia-laden Venado supercomputer

Promises up to ten AI exaFLOPs of grunt – if you turn down the resolution a bit

Los Alamos National Laboratory (LANL) has flipped the switch on its Venado supercomputer – a machine capable of bringing ten exaFLOPS of performance to bear on AI workloads for the Department of Energy.

Announced at the ISC high performance computing conference in 2022, Venado is among the first supercomputers to be built using Nvidia's Superchip architecture. But before you get too excited about the claimed performance, remember that exaFLOP metric only applies to AI workloads.

As powerful as Venado is, Nvidia hasn't dethroned AMD's 1.1 exaFLOP Frontier system – in fact, it's not even close. Floating point performance has long been the benchmark for supercomputers as seen over the past 30 years of Top500 High Performance Linpack (HPL) runs. But, with the rise of systems tailored to lower precisions and AI workloads, the meaning of the metric has become somewhat muddied.

Instead of the double precision performance listed on the Top500 ranking, the peak floating point performance rating of many systems designed to run AI workloads is often given at half (FP16) – or even quarter (FP8) – precision.

Venado was rated using FP8.

That lofty ten exaFLOP figure was therefore achieved when running under conditions that trade accuracy for higher throughput and lower memory bandwidth. That's perfect for running large language models (LLMs) and other machine learning tasks, but maybe not the best option if you're trying to simulate the criticality of a plutonium warhead.

Although Venado can't hold a candle to Frontier in FP64 workloads, it's no slouch. Thanks to the presence of Nvidia's H100 GPUs providing the bulk of the system's power, the machine should be able to churn out about 171 petaFLOPs of peak double precision performance – enough to just barely beat out the number 10 ranked system on November's Top500 ranking. Though we'll note actual performance in the HPL is usually a fair bit lower.

"With its ability to incorporate artificial intelligence approaches, we are looking forward to seeing how the Venado system at Los Alamos can deliver new and meaningful results for areas of interest," David Turk, deputy secretary for the Department of Energy, wrote in a statement.

So far LANL says the system, which was delivered last month, has already shown promise running material science and astrophysics simulations. That demonstrates the machine will do its fair share of HPC simulations and handle lower precision AI workloads.

Housed at LANL's Nicholas C Metropolis Center for Modeling and Simulation, Venado is a relatively compact system built in collaboration with Nvidia and HPE Cray, using the latter's EX platform and Slingshot 11 interconnects.

The all liquid-cooled system comprises 3480 Nvidia Superchips – including 2,560 GH200 and 920 Grace-Grace CPU modules.

As we've discussed in the past, the GH200 is essentially a system-on-module aimed at HPC and AI workloads. It features a 72-core Grace CPU which is based on Nvidia's high-end Neoverse V2 cores, 480GB of LPDDR5x memory, and 96 or 144GB H100 GPUs linked together with a 900GB/sec NVLink-C2C interconnect.

Nvidia's Grace CPU Superchips swap the GPU for a second Grace CPU, for a total of 144 cores linked by the same NVLink-C2C interconnect. Those cores are fed by up to 960GB of LPDDR5x memory capable of delivering upwards of 1TB/sec of bandwidth.

According to LANL these Grace CPU Superchips should boost performance for a wide range of HPC applications, especially those that aren't optimized or well suited to GPU accelerators.

While you might think an Arm-based system might mean HPC wonks need to re-skill in a hurry – as our sibling site The Next Platform has previously discussed – the supercomputing community has been working with Arm systems for a while now, dating back to Cavium's ThunderX and Fujitsu's A64FX platforms.

Venado won't even be the largest Grace-Hopper system we see this year. The UK Government's Isambard-AI will be powered by 5448 Nvidia GH200s. Meanwhile, EuroHPC's Jupiter System's GPU partition will pack close to 24,000 Grace-Hopper Superchips. ®

More about


Send us news

Other stories you might like