HPC supplier DDN has joined the small but growing crowd of firms swimming out into the enterprise AI mainstream, twinning its storage with Nvidia's DGX-1 GPU server.
DDN has made several recent moves to expand its enterprise storage credentials, buying Tintri to sell enterprise arrays and now entering the AI systems arena with its A3I-branded scalable reference architecture products.
DDN A3I scalable reference architecture
The enterprise AI systems market was first established by Pure with AIRI twinning its FlashBlade array with the DGC-1. Then NetApp joined in with its A700 all-flash array/FGX-1 combo, followed by the faster A800/DGX-1 setup. Dell EMC pitched in with a Ready Solution for AI: Deep Learning, and Cisco as well with its C480 AI/machine learning server.
All these systems involve all-flash storage and DDN has its AI200 and AI400 all-flash systems plus a hybrid flash/disk AI7990. Both run DDN's Exascaler software, which is a Lustre-based parallel file system.
If you've got $1m+ to blow on AI, meet Pure, Nvidia's AIRI fairy: A hyperconverged beastREAD MORE
The AI200 has up to 360TB of 24 x dual-ported NVMe flash drive capacity inside its 2U enclosure, and hooks up to the DGX-1 with either 4 x EDR InfiniBand (EDR IB) or 100Gbit/s Ethernet (100 GbitE). It delivers up to 20GB/sec of file system sequential read throughput and over 1 million IOPS.
The AI400 uses the same enclosure and delivers up to 40GB/sec of sequential read throughput and up to 3 million IOPS. It has 8 x EDR InfiniBand ports or 100GbitE ones, and the same flash capacity.
The larger AI7990, in a 4U cabinet, reverts back to 20Gb/sec sequential read performance and provides up to 700,000 IOPS. It supports 90 x 3.5-inch slots for SSDs and disk drives. There can be up to 4 expansion chassis, each with 90 bays, providing up to 5.6PB of capacity.
A downloadable DDN A3I Solutions Brief provides examples of A3I system use with A3I software such as Resnet-50 and Resnet-152, Caffe GoogleNet, Inception V3.
An A3I solutions guide provides configuration examples of an AI200 with 9 x DGX-1 servers and system performance using Tensorflow, Horovod, TensorRT, Torch, PyTorch and other AI frameworks.
Resnet-12 results at varying GPU counts. Not all suppliers provide values at each GPU count level, which is why there are gaps in the chart
In the Resnet-152 and Resnet-50 tests, the AI200 tested faster than competing Pure, NetApp and Dell EMC systems. Cisco has not provided any public performance information for its AI system.
Resnet-50 results. Again, not all suppliers provide values at each GPU count level
Get AI200 and AI7990 datasheets here. ®