HPE, Cerebras build AI supercomputer for scientific research
Wafer madness hits the LRZ in HPE Superdome supercomputer wrapper
HPE and Cerebras Systems have built a new AI supercomputer in Munich, Germany, pairing a HPE Superdome Flex with the AI accelerator technology from Cerebras for use by the scientific and engineering community.
The new system, created for the Leibniz Supercomputing Center (LRZ) in Munich, is being deployed to meet the current and expected future compute needs of researchers, including larger deep learning neural network models and the emergence of multi-modal problems that involve multiple data types such as images and speech, according to Laura Schulz, LRZ's head of Strategic Developments and Partnerships.
"We're seeing an increase in large data volumes coming at us that need more and more processing, and models that are taking months to train, we want to be able to speed that up," Schulz said.
"And then we're also seeing multi-modal problems, such as integration of natural language processing (NLP) and medical imaging or documents, so we have this complexity, we have this the need for faster, we have this need for bigger that's coming from our user side, from our facility side, and we need to make sure that we're constantly evaluating to have these different novel architectures, to have different usage models to be able to understand all that."
The LRZ team decided that the Cerebras technology, with its large shared memory and scalability, was a good match for the "pain points" they were trying to resolve, she said.
"And then the combination of Cerebras with the Superdome Flex from HPE, this seemed to make sense. The Superdome Flex has a very efficient pre and post-data processing, the resource management, it keeps the Cerebras system fed, and it keeps it nice and happy and full of data."
The Cerebras technology is built on the concept of using an entire silicon wafer to make the central processor, rather than cutting it up into individual chips. The result is a Wafer-Scale Engine that has 850,000 cores optimized for sparse linear algebra operations with 40GB of on-chip memory, fabricated using a 7nm production process.
"Each one of these cores is identical and fully programmable, built, from the ground up to optimize performance for the sparse linear algebra, compute operations that are common both to large scale AI and HPC workloads," explained Cerebras VP of product management Andy Hock.
"Each one of those cores is also directly connected to its four nearest neighbours across the entire device, within a high bandwidth, low latency interconnect mesh, and the dataflow traffic pattern between cores is fully programmable at compile time. So not only are we bringing massive compute resources to bear but also very high bandwidth memory and high bandwidth communication between those processors."
This architecture contrasts with the typical approach to tackling large-scale AI problems, which is to build a large cluster of servers. But as the problems and the models get ever larger, this approach shows diminishing returns, according to Cerebras.
"The time to solution doesn't scale linearly. So you might bring to bear, say, hundreds of processors, but only get a result tens of times faster," Hock said.
In contrast, the Wafer-Scale Engine, with its closely linked host of processors, allows for linear performance scaling out to models that are on the order of hundreds of millions of parameters or even billions of parameters, Hock claimed.
The CS-2 system has one of these Wafer-Scale Engines inside a 15U rack-mount chassis along with redundant power supplies, an internal liquid cooling system, and a dozen 100GbE network ports to link to the outside world.
In the new LRZ deployment, one CS-2 is linked using all those network ports to the HPE Superdome Flex via an SN3700M Switch using a fully non-blocking topology.
The Superdome Flex is fitted with eight InfiniBand HDR100 adapters to link it to the LRZ's backbone network and, according to HPE, "ensure a proper injection bandwidth from the file system to sustain the expected very high performance of the CS-2 and feed it smoothly."
- US fears China may have ten exascale systems by 2025
- Ready for testing: First-ever supercomputer powered by Intel's wildcard AI chips
- HPE building its 4th global 'supercomputer factory'
- AMD hasn't forgotten about that ambitious 2025 energy efficiency goal it promised
The Superdome Flex itself has 16 Intel Xeon processors with 12TB of system memory and 100TB of NVMe local storage.
However, the key part of the whole system is the software stack, which Cerebras claims enables the LRZ scientists and researchers to use standard AI software tools to build code that will run on the CS-2.
"We know as researchers and scientists ourselves at Cerebras that a high performance computing platform is only good if it's easy to use, and we can only reach that broader user audience if we allow them to programme this machine with standard frameworks that they're using today," Hock said.
This is achieved through a compiler that allows users such as machine learning researchers and data scientists to develop in standard frameworks like TensorFlow, and PyTorch, and translates their code into an executable that can run on one or more CS-2 devices.
"We also have a lower level software development kit that many of our HPC audience are using to add custom kernels into our compiler to bring to life not just to AI applications, but also HPC applications for a wide range of projects ranging from signal processing to physics based modelling and simulation," Hock said.
Dr Dieter Kranzlmüller, chair of the Board of Directors at LZR, said that the new system would be used to combine aspects of traditional HPC and AI processing for some applications.
"We're following here an integrated supercomputer architecture, which means that the future HPC system is heterogeneous and we'll pick up all the help we can get from advanced technology," he said.
"So the idea is really that you will have your HPC application, but the HPC application will get additional benefits reaching out to specialized solutions for particular applications or a sort of a model where we improve the time to solution by replacing part of the complexity with AI models, then run these on a specialized chip, on specialized hardware."
The new deployment is not the first installation to pair a Cerebras system with an HPE Superdome server. As detailed by our sister site The Next Platform, the Neocortex supercomputer at the Pittsburgh Supercomputing Center at Carnegie Mellon University actually has two Cerebras systems, linked together and to the Bridges-2 supercomputer via the Superdome. ®