DDN steps out of HPC niche and into enterprise AI systems hurly-burly
Squares up to Pure, NetApp, Cisco and Dell EMC
HPC supplier DDN has joined the small but growing crowd of firms swimming out into the enterprise AI mainstream, twinning its storage with Nvidia's DGX-1 GPU server.
DDN has made several recent moves to expand its enterprise storage credentials, buying Tintri to sell enterprise arrays and now entering the AI systems arena with its A3I-branded scalable reference architecture products.
DDN A3I scalable reference architecture
The enterprise AI systems market was first established by Pure with AIRI twinning its FlashBlade array with the DGC-1. Then NetApp joined in with its A700 all-flash array/FGX-1 combo, followed by the faster A800/DGX-1 setup. Dell EMC pitched in with a Ready Solution for AI: Deep Learning, and Cisco as well with its C480 AI/machine learning server.
All these systems involve all-flash storage and DDN has its AI200 and AI400 all-flash systems plus a hybrid flash/disk AI7990. Both run DDN's Exascaler software, which is a Lustre-based parallel file system.
If you've got $1m+ to blow on AI, meet Pure, Nvidia's AIRI fairy: A hyperconverged beastREAD MORE
The AI200 has up to 360TB of 24 x dual-ported NVMe flash drive capacity inside its 2U enclosure, and hooks up to the DGX-1 with either 4 x EDR InfiniBand (EDR IB) or 100Gbit/s Ethernet (100 GbitE). It delivers up to 20GB/sec of file system sequential read throughput and over 1 million IOPS.
The AI400 uses the same enclosure and delivers up to 40GB/sec of sequential read throughput and up to 3 million IOPS. It has 8 x EDR InfiniBand ports or 100GbitE ones, and the same flash capacity.
The larger AI7990, in a 4U cabinet, reverts back to 20Gb/sec sequential read performance and provides up to 700,000 IOPS. It supports 90 x 3.5-inch slots for SSDs and disk drives. There can be up to 4 expansion chassis, each with 90 bays, providing up to 5.6PB of capacity.
A downloadable DDN A3I Solutions Brief provides examples of A3I system use with A3I software such as Resnet-50 and Resnet-152, Caffe GoogleNet, Inception V3.
An A3I solutions guide provides configuration examples of an AI200 with 9 x DGX-1 servers and system performance using Tensorflow, Horovod, TensorRT, Torch, PyTorch and other AI frameworks.
Resnet-12 results at varying GPU counts. Not all suppliers provide values at each GPU count level, which is why there are gaps in the chart
In the Resnet-152 and Resnet-50 tests, the AI200 tested faster than competing Pure, NetApp and Dell EMC systems. Cisco has not provided any public performance information for its AI system.
Resnet-50 results. Again, not all suppliers provide values at each GPU count level
Get AI200 and AI7990 datasheets here. ®