AMD unveils its MI100 GPU, said to be its most powerful silicon for supercomputers, high-end AI processing

Chip takes aim at Nvidia's A100

AMD announced on Monday its Instinct MI100 accelerator, a GPU aimed at speeding up AI software and math-heavy workloads for supercomputers and high-end servers.

This is a 7nm TSMC-fabricated GPU code-named Arcturus, and is the first to feature AMD's CDNA architecture. We're told the hardware features 120 compute units and 7,680 stream processors capable of performing up to 11.5 TFLOPs of FMA64 and FP64 precision. The silicon peaks at 184.6 TFLOPS for FP16 matrix operations, and 92.3 TFLOPS for trendy bfloat16 math, AMD boasted. It ships on a PCIe card.

For a full analysis and commentary... See our sister site The Next Platform's coverage, AMD at a tipping point with Instinct MI100 GPU accelerators.

“Today AMD takes a major step forward in the journey toward exascale computing as we unveil the AMD Instinct MI100 – the world’s fastest HPC GPU,” claimed Brad McCredie, AMD's corporate veep of datacenter GPU and accelerated processing. “Squarely targeted toward the workloads that matter in scientific computing, our latest accelerator, when combined with the AMD ROCm open software platform, is designed to provide scientists and researchers a superior foundation for their work in HPC.”


The MI100 accelerator card ... Source: AMD. Click to enlarge

AMD did not reveal the number of transistors or the die size to reporters in a briefing last week. The specs that are public, however, show that each chip uses PCIe 4 to interface, contains 32GB HBM2 memory, can sustain a maximum of 1.2 TB per second of memory bandwidth, and has a max TDP of 300W. They’re also capable of shuttling 340GB per second of bandwidth per card with three AMD Infinity Fabric Links.

The MI100 accelerator is designed to compete against Nvidia’s latest A100 GPUs. However, the A100 has more RAM and memory bandwidth (up to 80GB and 2,039 GB/s). And while the MI100 has a higher base FP64 performance (11.5 TFLOPS v A100's 9.7 TFLOPS), when using Tensor Cores, the A100's FP64 performance rockets to 19.5 TFLOPS. The A100 has higher performance at lower precision, too, and has a higher max TDP.

AMD reckons its MI100 accelerator will offer customers a cheaper pathway toward building exascale supercomputer, though, by offering more performance per dollar compared to the A100. The hardware is supported by AMD’s ROCm 4.0 open-source platform, which can accelerate machine-learning frameworks PyTorch and Tensorflow.

Designed to be used alongside AMD’s Epyc server processors, the MI100 GPUs are expected to crunch through heavy machine-learning workloads and simulations for things like climate modelling, astrophysics, and fluid dynamics. The MI100 will be available through various vendors, including HPE, Dell, Supermicro, and Gigabyte and is expected to start shipping this month. ®

Biting the hand that feeds IT © 1998–2020