Our sister site The Next Platform has more details on the DGX-1, Nvidia's deep-learning system-in-a-box that crunches 42TFLOPS using double-precision math. It costs $129k, consumes 3.2kW, and uses eight Pascal-powered Tesla P100 GPUs plus Intel Xeon CPUs:
The system has two 16-core Xeon E5-2698 v3 processors, which run at 2.3 GHz and which are rated at 3 teraflops running in FP32 mode, according to Nvidia. The CPU has 512GB of DDR4 memory, which is a reasonable amount of main memory, so Nvidia is not skimping here.
The eight Tesla P100 accelerators have 16GB of HBM2 stacked memory on their package and are implemented on the mezzanine planar and linked in a hybrid cube mesh network to each other. Using half-precision FP16 data storage in the GPU memory, the eight Pascal GPUs can deliver 170 teraflops of aggregate performance for deep learning algorithms. The planar has a PCI-Express switch on it to link the GPUs to a pair of two-port 100Gb/sec InfiniBand adapters from Mellanox Technologies and a pair of 10Gb/sec Ethernet ports that come off the Xeon chips.
The system has four 1.92TB flash SSDs for high bandwidth storage, which is necessary to keep the CPUs and GPUs fed. The DGX-1 fits in a 4U enclosure and burns 3,200 watts across all of its components.
One DGX-1 can train an AI model using the AlexNet image library in two hours, which would take the dual-socket Xeon processors 150 hours to complete. That's because the P100 GPUs have ten times the memory bandwidth and crush the Xeons in terms of floating-point performance.
Check out Tim Prickett Morgan's article, DGX-1 Is Nvidia's Deep Learning System For Newbies, right here. ®