AI, caramba: NetApp pits scaly A800 ONTAP beast against Pure's AIRI fairy

Flashes next-gen unit to compete with FlashBlade box


The clash of the million-dollar AI titans resumes. NetApp has designed an ONTAP AI architecture based on its topline A800 flash array and Nvidia DGX-1 to try to win fat-pocketed customers away from Pure and Nvidia's AIRI AI system-in-a-box.

Fairy in the woods

If you've got $1m+ to blow on AI, meet Pure, Nvidia's AIRI fairy: A hyperconverged beast

READ MORE

Nvidia and NetApp collaborated on a reference architecture in June – and provided AI performance results using NetApp's A700 flash array.

Pure Storage provided similar results for its AIRI system, which combines Nvidia GPUs with Pure's FlashBlade array, and beat the A700/Nvidia system.

The ONTAP AI documentation (PDF) spells out the system components and provides more performance data showing the A800 feeding data to Nvidia's GPUs, which is, for now, faster than the FlashBlade array in Pure's AIRI system.

ONTAP AI has pre-validated designs with product available through NetApp channel partners. The main components are:

  • NetApp A800 with 48 x 1.92TB NVMe SSDs
  • Nvidia DGX-1 with 8 x Tesla V100 graphic processing units (GPUs)
  • Four Mellanox ConnectX4 single-port network interface cards per DGX-1
  • Cisco  Nexus 3232C 100Gb Ethernet switch

It uses a high-availability design with redundant storage and network and server connections.

The entry point is a 1:1 A800:DGX-1configuration and it can scale out to a 1:5 configuration and beyond. A 1:5 config has five DGX-1 servers fed by one A800 high-availability (HA) pair via two switches.

When it comes to AI, Pure twists FlashBlade in NetApp's A700 guts

READ MORE

Each DGX-1 server connects to each of the two switches via two 100GbitE links. The A800 connects via four 100GbitE links to each switch. The switches can have two to four 100Gbit inter-switch links, designed for failover scenarios. The HA design is active-active.

NetApp A800 and A700 systems can scale from two nodes (364.8TB) to a 24-node (12 HA pairs) cluster (74.8PB with A800; 39.7PB with A700s).

A single A800 system supports throughput of 25GB/sec for sequential reads and 1 million IOPS for small random reads at sub-500μs latencies. A full A800 cluster can feed data to the DGX-1 at 300GB/sec. NetApp said an A800 HA pair has been proven to support up to 25GB/sec under 1ms latency for NAS workloads – the ones used here.

In comparison NetApp's A700s system supports multiple 40GbitE links to deliver a maximum throughput of 18GB/sec. The A800 system also supports 40GbitE.

NFS and RoCE

The DGX-1 supports 100GbitE RDMA over Converged Ethernet (RoCE) for its cluster interconnect.

However, the A800 uses NFS to send data to the DGX-1s across the Nexus switch, and not RDMA. The Nexus ability to prioritise RoCE over all other traffic allows the 100GbitE links to be used for both RoCE and traditional IP traffic, such as NFS v3 storage access traffic.

Multiple virtual LANs (VLANs) are provisioned to support both RoCE and NFS storage traffic. Four VLANs are dedicated to RoCE traffic, and two VLANs are dedicated to NFS storage traffic.

To increase data access performance, multiple NFSv3 mounts are made from the DGX-1 server to the storage system. Each DGX-1 server is configured with two NFS VLANs, with an IP interface on each VLAN. The FlexGroup volume on the AFF A800 system is mounted on each of these VLANs on each DGX-1, providing completely independent connections from the server to the storage system.

Containers containers containers

The DGX-1 server leverages GPU-optimised software containers from NVIDIA GPU Cloud (NGC), including containers for all of the most popular DL frameworks. The NGC deep learning containers are pre-optimised at every layer, including drivers, libraries and communications primitives.

Trident is a NetApp dynamic storage orchestrator for container images that is fully integrated with Docker and Kubernetes. Combined with Nvidia GPU Cloud (NGC) and popular orchestrators like Kubernetes or Docker Swarm, it enables customers to deploy their AI/DL NGC Container Images onto NetApp storage.

Performance

NetApp's technical paper contains performance information, with a summary chart showing results as the number of GPUs increase:

ONTAP_AI_performance

Everything appears to scale well. No numbers are provided but we've carefully inferred them for the Resnet-50 and Resnet-152 categories from the chart, and tabulated them with the known Pure AIRI and A700 numbers.

For now, the A800 numbers overlap the A700 and Pure AIRI numbers at the 8-GPU level and then scale out through 16 to 32 GPUs. Upcoming work by Pure could well provide company in the latter cells:

Resnet-50:

  1 GPU 2 GPUs 4 GPUs 8 GPUs 16 GPUs 32 GPUs
Pure AIRI 346 667 1335 2540    
NetApp A700 321 613 1131 2048    
NetApp A800       6000 11200 22500

Resnet-152:

  1 GPU 2 GPUs 4 GPUs 8 GPUs 16 GPUs 32 GPUs
Pure AIRI 146 287 568 1122    
NetApp A700 136 266 511 962    
NetApp A800       2400 4100 9000

Here are the previous Pure and NetApp charts.

There is no price/performance data for ONTAP AI but we imagine millions of dollars are involved. AI at this level does not come cheap. ®

Narrower topics


Other stories you might like

  • Nvidia wants to lure you to the Arm side with fresh server bait
    GPU giant promises big advancements with Arm-based Grace CPU, says the software is ready

    Interview 2023 is shaping up to become a big year for Arm-based server chips, and a significant part of this drive will come from Nvidia, which appears steadfast in its belief in the future of Arm, even if it can't own the company.

    Several system vendors are expected to push out servers next year that will use Nvidia's new Arm-based chips. These consist of the Grace Superchip, which combines two of Nvidia's Grace CPUs, and the Grace-Hopper Superchip, which brings together one Grace CPU with one Hopper GPU.

    The vendors lining up servers include American companies like Dell Technologies, HPE and Supermicro, as well Lenovo in Hong Kong, Inspur in China, plus ASUS, Foxconn, Gigabyte, and Wiwynn in Taiwan are also on board. The servers will target application areas where high performance is key: AI training and inference, high-performance computing, digital twins, and cloud gaming and graphics.

    Continue reading
  • Nvidia taps Intel’s Sapphire Rapids CPU for Hopper-powered DGX H100
    A win against AMD as a much bigger war over AI compute plays out

    Nvidia has chosen Intel's next-generation Xeon Scalable processor, known as Sapphire Rapids, to go inside its upcoming DGX H100 AI system to showcase its flagship H100 GPU.

    Jensen Huang, co-founder and CEO of Nvidia, confirmed the CPU choice during a fireside chat Tuesday at the BofA Securities 2022 Global Technology Conference. Nvidia positions the DGX family as the premier vehicle for its datacenter GPUs, pre-loading the machines with its software and optimizing them to provide the fastest AI performance as individual systems or in large supercomputer clusters.

    Huang's confirmation answers a question we and other observers have had about which next-generation x86 server CPU the new DGX system would use since it was announced in March.

    Continue reading
  • Lenovo reveals small but mighty desktop workstation
    ThinkStation P360 Ultra packs latest Intel Core processor, Nvidia RTX A5000 GPU, support for eight monitors

    Lenovo has unveiled a small desktop workstation in a new physical format that's smaller than previous compact designs, but which it claims still has the type of performance professional users require.

    Available from the end of this month, the ThinkStation P360 Ultra comes in a chassis that is less than 4 liters in total volume, but packs in 12th Gen Intel Core processors – that's the latest Alder Lake generation with up to 16 cores, but not the Xeon chips that we would expect to see in a workstation – and an Nvidia RTX A5000 GPU.

    Other specifications include up to 128GB of DDR5 memory, two PCIe 4.0 slots, up to 8TB of storage using plug-in M.2 cards, plus dual Ethernet and Thunderbolt 4 ports, and support for up to eight displays, the latter of which will please many professional users. Pricing is expected to start at $1,299 in the US.

    Continue reading
  • Will optics ever replace copper interconnects? We asked this silicon photonics startup
    Star Trek's glowing circuit boards may not be so crazy

    Science fiction is littered with fantastic visions of computing. One of the more pervasive is the idea that one day computers will run on light. After all, what’s faster than the speed of light?

    But it turns out Star Trek’s glowing circuit boards might be closer to reality than you think, Ayar Labs CTO Mark Wade tells The Register. While fiber optic communications have been around for half a century, we’ve only recently started applying the technology at the board level. Despite this, Wade expects, within the next decade, optical waveguides will begin supplanting the copper traces on PCBs as shipments of optical I/O products take off.

    Driving this transition are a number of factors and emerging technologies that demand ever-higher bandwidths across longer distances without sacrificing on latency or power.

    Continue reading
  • GPUs aren’t always your best bet, Twitter ML tests suggest
    Graphcore processor outperforms Nvidia rival in team's experiments

    GPUs are a powerful tool for machine-learning workloads, though they’re not necessarily the right tool for every AI job, according to Michael Bronstein, Twitter’s head of graph learning research.

    His team recently showed Graphcore’s AI hardware offered an “order of magnitude speedup when comparing a single IPU processor to an Nvidia A100 GPU,” in temporal graph network (TGN) models.

    “The choice of hardware for implementing Graph ML models is a crucial, yet often overlooked problem,” reads a joint article penned by Bronstein with Emanuele Rossi, an ML researcher at Twitter, and Daniel Justus, a researcher at Graphcore.

    Continue reading
  • Despite global uncertainty, $500m hit doesn't rattle Nvidia execs
    CEO acknowledges impact of war, pandemic but says fundamentals ‘are really good’

    Nvidia is expecting a $500 million hit to its global datacenter and consumer business in the second quarter due to COVID lockdowns in China and Russia's invasion of Ukraine. Despite those and other macroeconomic concerns, executives are still optimistic about future prospects.

    "The full impact and duration of the war in Ukraine and COVID lockdowns in China is difficult to predict. However, the impact of our technology and our market opportunities remain unchanged," said Jensen Huang, Nvidia's CEO and co-founder, during the company's first-quarter earnings call.

    Those two statements might sound a little contradictory, including to some investors, particularly following the stock selloff yesterday after concerns over Russia and China prompted Nvidia to issue lower-than-expected guidance for second-quarter revenue.

    Continue reading

Biting the hand that feeds IT © 1998–2022