Analysis IBM offered HPC fans at SC17 a gawk at the server tray for the upcoming Summit supercomputer at Oak Ridge National Laboratory (ORNL), Tennessee.
This is the system slated to knock China's 93 petaFLOPS Sunway TaihuLight system off the top of the supercomputer tree when it goes live. It is slated to pump out a hoped-for 200 petaFLOPS.
The Summit system follows on from ORNL's current 27 petaFLOP Titan system, computing 5-10 times faster, storing eight times more data and moving it 5-10 times faster as well. It will enable simulation models with finer resolution than Titan, meaning higher fidelity and more accurate simulations.
Summit will have around 4,600 server tray nodes, which will use IBM's Witherspoon Power S922LC trays.
SC17 Summit server tray tweet (https://twitter.com/ibmpowerlinux)
According to Tom's Hardware, these water-cooled trays feature a pair of POWER9 processors, each connected by a 150GB/sec NVLink 2.0 to three 7.5 teraFLOP NVIDIA Volta V100 accelerators (each with a GV100 GPU) which are inter-connected across the NVLink.
Volta GV100 GPU with 84 streaming multiprocessors
Both the CPUs and the GPUs are water-cooled. There is 300GB/sec of aggregated NVLink bandwidth.
The POWER9 CPUs have up to 24 cores and 96 threads. NVLink supports CPU mastering and cache coherence capabilities with IBM POWER9 CPU-based servers. The tray will have from 512GB to 2TB of coherent DDR4 memory, with 340GB/sec of memory bandwidth. All six GPUs and the two POWER9 CPUs can access main memory.
The system uses will use PCIe gen 4 and CAPI to hook up SSDs, FPGAs and NICS, and there is 1.TB of bust buffer NV-RAM.
Trays will be connected across Mellanox InfiniBand links, 100Gbit/s EDR.
The Summit machine will have up to 250PB of storage, accessed by Spectrum Scale (GPFS) and 2.5TB/sec of aggregate bandwidth. This is interfaced via the burst buffers.
Simplistically the data flow is from Spectrum Scales across InfiniBand and into a server node's memory. Each POWER9 CPU controls the activities of three GPUs and these eight compute entities access main memory and much data. The results are streamed out to the burst buffer and then pushed out to the GPFS storage.