Pure Storage and Nvidia have produced a converged machine-learning system to train AI models using millions of data points.
The system has been designed by Pure and Nvidia, and is said to be easier and simpler to buy, deploy, and operate than buying and integrating the components separately; the standard converged infrastructure pitch.
AIRI's rack is meant to be an object of desire in your data centre.
FlashBlade is Pure Storage’s all-solid-state-storage array for fast access to unstructured data. It is a 4U box containing 15 vertically-mounted blades or object node servers. Each blade has a multi-core Intel Xeon CPU and 17TB of flash, totaling 255TB overall or around 523TB effective capacity after data reduction. Each blade also includes a pair of ARM cores, an FPGA, NVRAM, and PCIe-based networking. It is definitely not a commodity SSD-based system.
This is a powerful parallel-access flash array, and, in the AIRI, it has to feed more than 160,000 GPU cores.
It supplies data to the four DGX-1 systems which are Intel Xeon-based servers with eight Tesla V100 GPUs, the graphics chips interlinked with NVlink. These are seriously powerful GPUs, each with 5,120 CUDA cores, 640 Tensor cores, a 16MB cache and a 16GB HBM2 memory bank with a bandwidth of 900GB/sec. A V100 maxes out at 7.5TFLOPS using 64-bit floating-point math and 15TFLOPS using 32-bit.
With its 32 V100s, an AIRI has 163,840 CUDA cores and 20,480 Tensor cores at its disposal. We were told it provides four Tensor PFLOPS. One aspect is that applications – machine-learning jobs – have to run in containers.
To connect to the outside world, there are a couple of Arista 100GbE switches supporting GPUDirect RDMA. This enables a direct and fast path for data transmission between the GPUs and FlashBlades using PCIe features. The interface between FlashBlade and the DGX-1s is file-based: NFS.
The specific Arista products used weren’t revealed.
There are two included software items:
- Nvidia’s GPU Cloud Deep Learning Stack.
- The AIRI scaling toolkit which is a configuration validator and multi-node training management system.
Pure said this software should help data scientists to get machine-learning projects up and running in hours, and not days or weeks.
HPE’s Apollo 6500 gen-10 can crunch up to 125 TFLOPS using single-precision floating-point math. A single V100 manages about 15 TFLOPS on that measure, meaning one DGX-1 will perform up to roughly 120 TFLOPS in comparison, and an AIRI with four of them 480 TFLOPS. It’s in a different league.
IBM’s AC922 supports up to six Tesla V100 GPUs, two fewer than a single DGX-1, and AIRI has four DGX-1s. Again, it’s in a different league from the IBM system.
Pure said multiple training jobs can run in parallel and complete faster on AIRI than with other systems, with run time cut to a quarter of what it would otherwise be.
What about the price? Pure wouldn’t provide one, saying it was down to the channel supplier, but would reflect the cost of the components.
Back of the envelope math says four DGX-1s will cost around $600,000. The Arista Switches costs, say, $3,000-plus apiece while FlashBlade costs under $1/effective GB, meaning, with its 523TB effective capacity, it will cost less than $523,000.
We are looking at a million-bucks-plus system here – a large enterprise or specialized customer purchase. The system as introduced supports four DGX-1s but deployments might start with one to two DGX-1s, which would lessen the upfront cost.
Customers will be organisations that need to run lots of large-scale machine learning jobs to train models on millions of data items.
Each DGX-1 draws 3.2kW of power, so you're looking at near enough 13kW before factoring in storage and networking and the Intel compute silicon.
AIRI is available now through selected reseller partners, such as ePlus Technology, FusionStorm, GroupWare Technology, PNY, Trace3, World Wide Technology and Xenon. ®