Comment Parallel file systems were developed to overcome delays servers experienced when accessing files on disk storage systems. Flash arrays get rid of disk access latencies and so weaken the need for parallel file systems.
Spectrum Scale, the renamed GPFS (General Parallel File System) and Lustre are two such parallel file systems. Instead of waiting for one IO stream to fill a server with data from a file system, they use multiple simultaneous IO streams which fill the server much more quickly. Such technologies are used with parallel compute clusters for high-performance computing applications such as simulations, seismic analysis and reservoir modelling, and enable the storage resource to support many more IO requests over a period of time.
Disk arrays are good at streaming IO but poor at random IO as these involve more disk seeks. In an interview with the Next Platform, Pure Storage’s Max Kixmoeller, its VP for products, said Pure wanted to bypass parallel file systems, which involve extra layers of code on top of an underlying file or object system, and have the applications in accessing servers connect directly with the underlying file/object system on a Pure FlashBlade array. He said a base line with NFS is a good starting point.
That way there would be no need for extra hardware and software to cache random IO data and make it sequential before sending it to the parallel file system. Thus burst buffers would not be needed and the cost and complexity of running such applications would be reduced by using base flash storage arrays with NFS, rather than burst buffers fronting GPFS or Lustre with a back end disk drive array.
Talking about the FlashBlade beta program, he said: “I would say that the people who came into the early adopter program have found the upper boundaries of what NetApp and Isilon can do, and so FlashBlade is providing them a breakthrough in terms of dramatically more compute and simulation power than they had before.”
Let’s add in NVMe over fabric ideas, which effectively banish network access latency, and we could be seeing petabye-plus flash arrays providing pretty-near instant large dataset access as a matter of course in years to come, with none of today’s burst buffer/parallel file system complexity – yep, just some everyday HPC. ®