HPC

Wow, machine learning, what a snoozefest... less so if you strap a bunch of GPUs to your storage

GPU-boosted system market is, like, literally so hot right now


Analysis Machine learning stresses storage because training the models means millions if not billions of files have to be fed to the training system with its GPUs in as quick a time as possible.

Suppliers are devising converged, hyperconverged and composable systems to sidestep chokepoints and make it simpler to get ML customers up and running.

Recently we have had the Pure Storage and Nvidia AIRI converged system, which brings four Nvidia DGX-1 GPU-enhanced servers to bear on FlashBlade-stored data.

This follows HPE's Apollo 6500 Gen10 and IBM's AC922 supercharged servers.

Now Nvidia has released an updated DGX, the DGX-2. Chinese server firm Inspur and composable infrastructure supplier Liqid have produced a Matrix Rack Composable Platform for machine learning while X-IO has added GPUs and SQream database software to its Axellio combined server+storage box.

Storage_and_GPUs

Nvidia DGX-2

The DGX-2 is two DGX-1s plus more CPU, memory, interconnect bandwidth and storage:

  DGX-1 DGX-2 Notes
GPUs 8x V100 16x V100  
Interconnect NVlink NVlink2 with 12 NVSwitches 216 ports
CPUs 2x 20-core Xeon E5-2698 v4 2l.2GHz 2x Xeon Platinum Faster CPUs
GPU Memory 256GB HBM 512GB  
System Memory 512GB DDR4 1.5GB HBM Triple pooled memory space
Storage 4x 1.92TB SSD – 7.68TB 30-60TB NVMe SSD 4-8x more capacity
Performance 960 TFLOPS 1,920 TFLOPS Bigger memory pool means larger jobs
CUDA Cores 40,960 81,920  
Tensor Cores 5,120 10,240  
Weight 134lbs 350lbs More than 2x
Networking 4x EDR InfiniBand & 2x 10GbitE 8x EDR InfiniBand or 100GbitE  
Power 3.5kW 10kW  
Price $149,000 $399,000 More than 2x

The much larger system memory means larger jobs can be run in the DGX-2. They should complete more than twice as fast because of this.

With the DGX-2 being announced so close to the Pure-Nvidia AIRI system, it's clear that Pure and Nvidia decided not to have a DGX-2-based AIRI. However, it's possible that a subsequent AIRI system could be DGX-2-based, and have larger flash drives inside to keep the 16 GPUs occupied. This would be, we suppose, a $2m-plus system which would reduce the number of potential customers.

Inspur and Liqid

Inspur and Liqid have co-developed their Matrix Rack Composable Platform which lets users dynamically set up CPU-GPU-storage combinations composed for specific workloads. Inspur provides the i24 servers and GX4 chassis, Nvidia the Tesla V100 and P100 GPUs, and Liqid the Grid PCIe-based fabric hardware and software.

Liqid_inspur_grid

Start with a set of disaggregated pools of compute, GPU, storage and Ethernet networking resources. Elements from these pools can be combined, clustered, orchestrate and shared over the PCIe fabric.

The pool elements are:

  • 24x Compute Nodes (Dual Intel Xeon Scalable Processors)
  • 144x U.2 Solid-State Drives (SSD), 6.4 TB per SSD (922TB)
  • 24x Network Adapters (NIC), Dual 100 Gb/NIC
  • 48x NVIDIA GPUs (V100 and P100)
  • Liqid Grid (Managed PCIe Gen 3.0 Fabric) and Liqid Command Center (software)
Liqid_Grid_switch

Liqid Grid PCIe fabric switch

A maximally configured system might blow the Pure-Nvidia AIRI system away and has three times more V100 GPUs than Nvidia's own DGX-2. The cost of such a fully configured Matrix Rack would be astronomical.

Dolly Wu, GM and VP at Inspur Systems, said: "AI and deep learning applications will determine the direction of next-generation infrastructure design, and we believe dynamically composing GPUs will be central to these emerging platforms."

We might expect the other composable server system suppliers to add GPUs to their disaggregated pools too, meaning Attala Systems, HPE with Synergy, DriveScale and Intel with its RackScale product.

X-IO, SQream and Nvidia

Back on the more affordable side of planet Earth we have X-IO's Axellio edge compute+storage product receiving an Nvidia GPU implant and SQream database software to deliver a "converged appliance for extremely rapid data analytics of massive datasets".

What SQream has done with its DBMS software is to take repetitive low-level SQL query operations and run them on a server GPU accelerator. The company says complex queries contain multiple filters, type conversions, complex predicates, exotic join semantics, and subqueries. When these are run on 100TB-level datasets, with billions of rows in several tables, they can take several minutes to hours to complete (query latency.)

SQream says it can provide a 20x speedup of queries on columnar data base sets, and query large and complex data up to 100x faster than other relational databases. Its latency on the complex query of 100TB-level datasets is, it claims, in seconds to minutes territory.

Its ingest speed is up to 2TB/hour.

This enables a large-scale reduction on the servers needed to run SQL queries on large data sets; SQream claims a single 2U server plus GPU is equivalent to a 42U rack full of servers. Basically SQream says use our relational database to get screaming SQL performance.

Then X-IO says run it on our hardware and go faster still.

The server/storage base is X-IO's Axellio Edge Micro-Datacenter appliance product; a 2U box containing two Xeon server modules with two Xeons apiece, 2x Tesla P100 GPUs, a PCIe fabric, and 1 to 6 FlashPacs, which each hold up to 12x dual-port NVMe SSDs (800, 1,600, 3,200 or 6,400GB) with a maximum capacity of 500TB.

SQream and X-IO claim a two-node example of their combined system can push data from storage to the GPU at up to 3.2GB/sec per GPU. Their combined system can reach 11.5TB/hour in an analytics run.

They say users can get real-time answers to queries that took minutes before, or expand their query windows from weeks to years to find trends, query trillions of rows of data and get results faster.

X-IO might also be looking at the machine learning space. In theory it would be easy enough to climb into bed with a machine learning framework software supplier. Just another partnership, right?

Get an Axellio datasheet here.

Salivating

Machine learning is seen as a hot growth market. Combine that with on-premises NVMe flash storage and big data analytics applications, and the result is hot boxes galore.

We must surely expect Dell EMC and NetApp to enter the GPU-boosted system market, not to mention Huawei and Lenovo. Other all-flash array vendors might look at the Pure-Nvidia deal and think "me too" e.g. Kaminario, Tintri and WDC Tegile.

The performance gains over non-GPU systems are so impressive that profit margins can be set high enough to get on-commission sales reps salivating like crazy. This GPU-accelerated server/storage product development space is going to see frenzied development as suppliers pile in to take advantage of the growth prospects. ®


Other stories you might like

  • Dog forgets all about risk of drowning in a marsh as soon as drone dangles a sausage

    It's not the wurst idea in the world

    Man's best friend, though far from the dumbest animal, isn't that smart either. And if there's one sure-fire way to get a dog moving, it's the promise of a snack.

    In another fine example of drones being used as a force for good, this week a dog was rescued from mudflats in Hampshire on the south coast of England because it realised that chasing a sausage dangling from a UAV would be a preferable outcome to drowning as the tide rose.

    Or rather the tantalising treat overrode any instinct the pet had to avoid the incoming water.

    Continue reading
  • Almost there: James Webb Space Telescope frees its mirrors and prepares for insertion

    Freed of launch restraints, mirror segments can waggle at will

    NASA scientists have deployed mirrors on the James Webb Space Telescope ahead of a critical thruster firing on Monday.

    With less than 50,000km to go until the spacecraft reaches its L2 orbit, the segments that make up the primary mirror of the James Webb Space Telescope (JWST) are ready for alignment. The team carefully moved all 132 actuators lurking on the back of the primary mirror segments and secondary mirror, driving the former 12.5mm away from the telescope structure.

    Continue reading
  • Arm rages against the insecure chip machine with new Morello architecture

    Prototypes now available for testing

    Arm has made available for testing prototypes of its Morello architecture, aimed at bringing features into the design of CPUs that provide greater robustness and make them resistant to certain attack vectors. If it performs as expected, it will likely become a fundamental part of future processor designs.

    The Morello programme involves Arm collaborating with the University of Cambridge and others in tech to develop a processor architecture that is intended to be fundamentally more secure. Morello prototype boards are now being released for testing by developers and security specialists, based on a prototype system-on-chip (SoC) that Arm has built.

    Arm said that the limited-edition evaluation boards are based on the Morello prototype architecture embedded into an Armv8.2-A processor. This is an adaptation of the architecture in the Arm Neoverse N1 design aimed at data centre workloads.

    Continue reading

Biting the hand that feeds IT © 1998–2022