Durham Uni and Dell co-design systems to help model the universe
A particle physicist, an astronomer and a cosmologist meet in a bar. You what, COSMA8?
The James Webb Space Telescope has served up impressive views of the cosmos since the first images were revealed back in July, but it is also providing data to other scientific endeavors, including cosmology projects such as those at Durham University in the northeast of England.
Durham is part of the Distributed Research utilizing Advanced Computing (DiRAC) infrastructure, established to provide supercomputing facilities for theoretical modelling and HPC-based research in particle physics, astronomy and cosmology at several university sites around the UK.
The Durham part of the system, COSMA (COSmology MAchine), has been built with lots of memory per compute core, making it ideal for memory intensive workloads such as large cosmological simulations. The latest system, COSMA8, comprises 360 compute nodes, each with two 64-core AMD Epyc processors and a terabyte (1TB) of memory.
This is used for large-scale simulations of the universe, where the models can be tweaked in line with various theories of how the universe evolved to the position we see today, and the predictions of the models compared with actual data from the James Webb Space Telescope to gauge how well they represent reality.
"We start with The Big Bang and then we propagate these simulations forward in time and basically evolve the universe over billions of years and see how it changes," Dr Alastair Basden, Head of the COSMA HPC Service, told us.
"So, for things that we don't really properly understand yet, like dark matter and dark energy, we're able to tweak the input parameters, like what if gravity behaves differently over long distances and things like that. And we're able to tweak all these parameters and then try to match them to what we see from the James Webb images once they've been calibrated and scaled."
Durham has several generations of COSMA operating at the same time, and both this and the previous incarnation were designed in collaboration with Dell in order to get the optimal configuration for the workload in hand. The latest cluster is known as COSMA8.
"We've got about 8GB of RAM per core on each node. If you look at a more conventional HPC system, they'll have about a quarter of that, and that means to run the same simulations that we can run here, you would need four times as many cores to get the results in the same length of time, so it’s very much a bespoke design for these cosmology simulations,” Basden said.
Another feature of COSMA8 is a high-performance NVMe-based checkpointing storage system based on the Lustre file system. This is a common feature of HPC deployments, allowing a large workload that requires lengthy runtime to store its state as it goes along, so that it does not have to start from scratch in the event of some failure.
"It's a very fast file system, about 400GB per second, able to suck up the data checkpoints. So the simulation will be running along, it will dump a checkpoint every few hours or something like that, so if something goes wrong, you've got a point we can restart that simulation from," Basden explained.
The main file system for COSMA8 is built on Dell PowerEdge R6525 and R7525 rack servers plus PowerVault ME484 JBOD enclosures, and the whole thing is connected with a 200Gbps InfiniBand fabric.
Validating tech including liquid cooling
But it appears that Dell's relationship with the Durham team is more than just that of a supplier or system integrator, as the university often gets early access to new or experimental technologies, which enables both parties to see how well they stand up when put to work, according to Tim Loake, UK VP for Dell's Infrastructure Solutions Group.
"Durham is a is a key partner of ours, so they're one of our HPC Centers of Excellence, in terms of us assisting them in trying out some of our new technologies, as well as getting them to test and feedback on it," Loake said.
"We give Alistair and the Durham team access to our labs and to early release products, and get their feedback to help take the knowledge and experience that they have from running a very high end HPC system and feed that back into our product development, as well as bring new technologies to them," he explained.
As one example, Dell introduced a switchless interconnect into the system from a company called Rockport Networks. This distributes the traffic management to intelligent endpoint network cards in each node that are linked together via a passive hub called a SHFL.
Another area where Durham played a part in validating the technology is liquid cooling, according to Loake. This was installed as part of COSMA8 in early 2020, and expanded about a year later.
"It was probably the largest direct liquid cooling system that we had deployed, certainly in the UK and probably across Europe when we when we first put it out," Loake said.
“Obviously, direct cooling now is becoming more mainstream across many datacenters and HPC systems, but directionally, it was a lot of the learnings that we took from working with Alistair and the team at Durham that has then fed into the product design that we're now bringing out with the next generation of Power Edge servers,” he added.
This deployment used direct liquid cooling, where the coolant is circulated through heatsinks attached to the components that generate the most heat, such as the CPU.
However, interest is now turning to immersion cooling, where the entire system is submerged in a dielectric fluid that conducts heat but not electricity.
“Full immersion is something that we're very interested in, and we're actually trying to get some funding for an immersion system at the moment,” Basden said.
“Part of the advantage of immersion cooling is that you remove all the fans, all the moving parts, so you don't put spinning drives in either, it has to be a pure flash system, and no moving parts means that the need for maintenance is hopefully greatly reduced,” said Loake.
However, most of the immersion cooling systems Dell is working with have the ability to raise an individual node out of the fluid, should access be required, he added.
“Think of it as like a 42U rack tipped on its side and then you can just pull a node straight up as if you were pulling it out the front of a normal rack, but they come up and then obviously the liquid drains into the bath. The rest of the systems are unaffected and you can do whatever maintenance you need to,” he said.
Other technologies that are being tried out are FPGA accelerators and Nvidia Bluefield data processing units (DPUs), while Dell is also looking at other kinds of CPUs to examine whether the performance is purely raw cores or whether we can get different performance, or more performance per watt.
According to Basden, some of the technologies they test out are evaluated for immediate future use in projects, while others are looking further out. One of the latter is the Excalibur project which is part of the UK's preparations for exascale computing.
"Mostly it's software efforts in getting code ready to run on large exascale systems but a small part of it is hardware as well, looking up what novel hardware might have potential within a HPC system," he said.
This includes the Rockport interconnect technology as well as composable infrastructure from Liqid that allows GPUs to be assigned to different nodes. Liqid develops a software layer that pulls together components connected via a PCIe 4.0 fabric.
- This ancient quasar may be the remains of the first-gen star that started us all
- Large Hadron Collider experiment reveals three exotic particles
- CERN draws up shutdown plans to save energy
- AI to help study first images from James Webb Space Telescope
The latter is something that is useful for the Durham setup because of the nature of the workloads it operates, according to Basden. Because of the large memory footprint and the dynamic nature of the calculations, the cosmology code tends not to be well suited for GPU acceleration, but some calculations may benefit from this support, and so the composable infrastructure allows these to be switched over to a node if required.
"For some simulations that need a large number of GPUs, perhaps on a smaller number of nodes, they can have that," he said.
At the moment, the composable infrastructure is only implemented in a small prototype system, but "it's one of those things that if we were ever going to do that in a large-scale future system, we'd need to have built up the confidence first that it was gonna work," Basden explained. ®