UK's dream of fusion power by 2040s will need GPUs
Boffins plan 'digital twin' to help hit deadline, recruit heavy-hitting partners
The UK Atomic Energy Authority (UKAEA) has recruited Intel and the University of Cambridge for the compute resources it needs to develop Britain's prototype nuclear fusion reactor – including building a "digital twin" of the design to help with testing.
The Spherical Tokamak for Energy Production (STEP) is Britain's plan to demonstrate that fusion energy can be economically viable, and aims to have fusion delivered to the national grid by the early 2040s.
But according to the UKAEA, there just isn't enough time to build everything using a traditional iterative process of designing and then testing all the subsystems – so the team plans to make extensive use of virtual models using supercomputing and AI to speed things up.
"In the same way that the aerospace sector has moved wind tunnels into the world of computational fluid dynamics, or the automotive sector has moved the process of crash testing into the virtual world using finite elements, we need to do the same for designing fusion power plants," explained the UKAEA's director of computing programmes Dr Rob Akers at a press conference on Wednesday.
The problem, he opined, is that a fusion reactor is an incredibly complex, strongly coupled system and the models for how fusion power plants operate are limited in their accuracy.
"There's a lot of physics that spans the entire assembly – from structural forces to thermal heat loads, to electromagnetism and radiation. It's really a grand challenge simulation problem, which is where supercomputing and artificial intelligence come in," he explained.
This is why UKAEA is working with Intel and the University of Cambridge – not just on the kind of exascale technology needed, but also on how to handle and process the enormous volumes of data on which the project will rely.
And that means GPUs.
- Private company set up to oversee UK's prototype fusion reactor
- Brit fusion magnets set for US gamma ray bombardment test
- America's nuclear fusion 'breakthrough' is super-hot ... yet far from practical
- US Dept of Energy injects more particles of cash into tokamak fusion reactors
The current supercomputer at the Cambridge Open Zettascale Lab is built with Dell PowerEdge servers based on Intel's Sapphire Rapids 4th-gen Xeon Scalable processors – but the project is looking at extending this with Ponte Vecchio Data Center GPU Max accelerators.
"Traditional x86 systems are not going to get us there – we need to be using GPU technology," said director of research computing services at the University of Cambridge, Dr Paul Calleja.
Ponte Vecchio systems provide an order of magnitude more grunt in terms of performance per watt, he claimed. But he noted that as soon as you start deploying GPUs, you have a problem with the programming environment.
"How do you program for a GPU world so you're not locked into a single vendor solution? Because we might work with Intel today, but who knows what's gonna happen in the future. We don't want our codes to be locked into a particular vendor."
For this reason, the project is also looking at Intel's oneAPI programming model that supports so-called heterogeneous computing. This will provide a way for the applications to run on Intel GPUs, or on Nvidia GPUs – or even AMD GPUs – with minimal recoding required, Calleja claimed.
Dealing with a data deluge
Intel is also striking it lucky, with the Open Zettascale Lab settling on the chip shop's DAOS (Distributed Application Object Storage) platform to handle the storage for the project. Calleja oddly described DAOS as a new file system, although it has actually been around for several years.
"Data bottlenecks are a real problem when you're trying to feed tens of thousands of GPUs, and here we're looking at solid state storage, NVMe storage, and parallel file systems are traditionally not good at exploiting NVMe technologies," Calleja said, claiming that DAOS will help to "unlock the performance hidden in NVMe drives."
The other key development is that HPC and artificial intelligence are converging at the exascale level, according to Akers, allowing a "digital twin" of STEP to be used in building it.
"We need to turn HPC into an engineering design tool through the deployment of something called surrogate models, where we synthesize all of the information that we extract from simulation to turn engineering into a tool that can be used for the design," he explained.
"The hope is that within the next decade, by exploiting the UK exascale roadmap that was recently announced and the funding that was announced in the recent budget, we will be able to get to the point where a digital version of STEP can be developed, before the real plant itself. Then we'll be able to use that digital version of the STEP power plant to dramatically reduce the need for real world validation," Akers said.
According to Calleja, it also not yet clear what shape the UK's exascale capabilities may take, or whether there will be one central system or a number of distributed units.
"It's still to be decided what the exascale ecosystem may look like – whether we're going to have one large exascale system as a single machine at a single location, or should we have a federation of large machines with different technologies at different places, that together add up to exascale," he said, adding: "I personally believe more in a federation of exascale systems with different technologies." ®