A Canadian supercomputer centre using a fast access parallel file system has stuffed an Excelero burst buffer between this storage and the compute nodes.
Why, you ask?
We'll explain. The SciNet supercomputer centre at the University of Toronto provides resources for thousands of researchers in biomedical, aerospace, climate sciences, and more. Its supercomputing jobs - large-scale modelling, simulation, analysis and visualization applications - can sometimes run for weeks, and interruptions delay or occasionally destroy an entire job's results, meaning it has to be run again.
Checkpointing, with fast interrupted job restart, has been used to reduce that risk but, with the disk-based Spectrum Scale (GPFS) storage, as individual jobs become larger, they take longer, making the calculation difficult – or in the worst case, impossible to carry out.
The new idea is to use a flash-based burst buffer between the disks and the compute nodes, so checkpointing can be done faster. The way it was done was to fit NVMe flash drives to some of the compute nodes, which already had a low latency fabric interconnect, and virtualize them into a shared flash pool using Excelero's NVMesh software.
There are 80 NVMe flash drives in 10 servers which support the NSD (Network Shared Drive) protocol. Collectively this burst buffer system is said to provide 20 million random read 4K IOPS, 148GB/sec of write burst bandwidth and 230GB. /sec of read throughput. Checkpoints can be completed in 15 minutes.
Dr Daniel Gruner, CTO at the SciNet High Performance Computing Consortium, said: "NVMesh is an extremely cost-effective method of achieving unheard-of burst buffer bandwidth."
The NVMesh burst buffer "enables standard servers to go beyond their usual role in acting as block targets – the servers now can also act as file servers.”
It would be interesting to compare the performance and cost of this NVMesh configuration with DDN's IME burst buffer. ®