This article is more than 1 year old

TACC wrangles IBM GPFS on to DSSD for HPC LOLs

Or, we could say: legacy parallel file system on fresh flash rack storage

A data-intensive supercomputer in Texas is using more than half a petabyte of DSSD flash storage, along with IBM’s Spectrum Scale (GPFS) parallel file system, to provide massively fast random access to gazillions of small files.

The Wrangler system is being built because traditional HPC systems are poor at dealing with small read accesses to many files, being better suited to large sequential access to fewer files. Wrangler uses different and faster storage to deliver data from millions of small random data accesses to its compute cores.

Its data researchers want to work with data in small files and not get involved with vectorisation, code optimisations, lustre striping and message passing interfaces (MPI).

It is located at the Texas Advanced Computing Centre (TACC) and is being built in partnership with Indiana University and the University of Chicago.

It is supported by a grant from the Division of Advanced Cyber Infrastructure at the National Science Foundation. The hardware equipment suppliers include Dell, EMC and Mellanox.

There are 96 Dell compute nodes – each with 24 Haswell cores, making 2,304 cores in total – and 128GB of DRAM. There are 40Gbit/s Ethernet and Mellanox FDR networking links, 56Gbit/s InfiniBand to a 10PB disk-based storage system which is replicated at the University of Indiana.

Wrangler_HW_scheme

Wrangler hardware scheme

In total, the disk storage system consists of:

  • More than 20PB of raw disk for “project-term” storage
  • Around 75GB/s sequential write performance
  • Lustre-based file system with 34 OSS nodes and 272 storage targets
  • Exposure to users as a traditional filesystem and iRODS based data management system

But the working set data is placed in more than 500TB of DSSD flash storage, with access using the GPFS-based parallel file system, utilising multiple DSSD units.

TACC said the DSSD flash storage provides the truly “innovative capability” of Wrangler. Its bandwidth is 1TB/s and it pumps out more than 250 million IOPS.

This flash rack is connected to the cores by a PCIe interconnect, which is described as a high-speed global file store, accessible by all the nodes.

There is more than half a petabyte of usable space once RAID’ing has taken place.

Next page: Speed king

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like