UV 2: RETURN of the 'Big Brain'. This time, it's affordable

Hefty loads bursting out of your box? Try this


Silicon Graphics is betting big on Intel's latest Xeon E5-4600 processor and its own revved up NUMAlink 6 shared memory interconnect, creating a "big brain computer" that can gang up to 4,096 cores into a single system image to run massive Linux workloads and fairly large Windows jobs, too. The new UV 2 is exactly the kind of box, says SGI, that customers with big data warehouse, big database, big data, and traditional HPC workloads have always wanted – and in many cases could never have afforded.

But the shift to new packaging and lower-cost Xeon E5 processors from Itanium and then Xeon E7 chips from Intel have made the shared memory systems from SGI more broadly accessible at just the same time that many workloads seem to be busting out of general-purpose four-socket boxes. This is good news for SGI, which has had its share of financial woes as it chases the capricious and fiercely competitive HPC and hyperscale data center markets.

SGI will also be pleased to note that Intel has not yet got interconnect fabrics woven into its Xeon processors and chipsets, although it is clearly working on that with the acquisition of Cray's family of HPC interconnects back in April, its purchase of the InfiniBand chip and switch business from QLogic in January, and the Ethernet switch chip business Fulcrum Microsystems back in July 2011.

However, SGI still has a good window in which to capitalize on its NUMAlink interconnect before Intel does whatever it's going to do to integrate interconnects with its CPUs and chipsets. It would not be surprising to see SGI sell the NUMAlink biz to Intel for a big chunk of change, or maybe even an acquisitive Advanced Micro Devices or Hewlett-Packard. In fact, it would not be surprising at all if HP just upped and bought SGI to get out of its Itanium conundrum with Oracle. But so far, SGI seems content to go it alone and to peddle rack and shared memory systems all by its lonesome.

A rack's worth of SGI's UV 2000 supercomputer

A rack's worth of SGI's UV 2000 supercomputer

SGI put out a bit of a preview on the UV 2 lineup when Intel launched the Xeon E5-4600 processors a little more than a month ago. At the time, the company said that it was switching away from the Xeon 7500 and E7 and their multiple QuickPath Interconnect (QPI) ports. SGI had also said it was moving away from the "Boxboro" 7500 chipset that it had used to interface with the NUMAlink 5 interconnect for lashing nodes tightly together in a memory-coherent fashion. The UV 1000 high-end machines were based on a two-socket blade.

The Xeon 7500 and E7 chips have four QPI ports coming off each socket, and the original UV 1000 design used two QPI ports on the Xeon 75000 or E7 chips to cross-link the two sockets together, with one of the remaining two QPI ports going to the Boxboro chipset (which controls access to main memory and local I/O slots on the blade) and the other that links out to the NUMAlink 5 hub, which in turn has four links out to the NUMAlink 5 router. That router implements an 8x8 (paired node) 2D torus that can deliver up to 16TB of shared space across those 256 sockets.

While SGI let it be known a month ago that it was ditching the Xeon E7s for the E5-4600s in the next-generation UV 2000 shared memory supers, the company did not say exactly how it was going to build these machines. (SGI had to save a little something to talk about at the International Super Computing conference in Hamburg, Germany this week, after all.) El Reg speculated that there would be a goosed interconnect and that SGI would stick to two-socket blades. We were right on the first count, but because there are two fewer QPI ports on the Xeon E5-4600 than on the Xeon 7500 and E7, the bandwidth between the ports would have been significantly diminished. It was easier and cleaner to make what is in effect a microserver and use the QPI ports to double up out to the new NUMAlink 6 interconnect hub, and that is what SGI has done.

SGI would have no doubt preferred to build the original UV 1000 machines, which debuted in November 2009 and which spanned 128 blades and 256 sockets in a shared memory configuration, using cheaper Xeon 5500 and 5600 processors. But these chips have only one QPI port coming off their sockets and their on-chip memory controllers cannot address as much memory as the Xeon 7500s and E7s, so SGI had no choice but to use the fat Xeons in 2009 and await the less expensive E5-4600s here in 2012.

The memory expansion on the E5-4600 chip is the key to the rejiggered UV 2000 machine, since each processor socket can currently hold a dozen memory slots and address up to 384GB of memory without any external memory buffers or funky chipsets. But the real secret sauce in the UV 2000 is the NUMAlink 6 interconnect, which is a substantial re-engineering of the NUMAlink 5 interconnect that offers about 2.5 times the bandwidth and a much simpler system design as well.

Jill Matzke, director of server marketing at SGI, says that with the NUMAlink 6, a bunch of different things happened all at once. First, SGI's chip fab partner, Avago Technologies, did a process shrink, allowing for more stuff to be crammed onto the chip. (Avago, which is a spinout of Agilent Technologies, itself a spinout from Hewlett-Packard, doesn’t actually make the NUMAlink chips; a fab in Taiwan does.) So SGI could take two of the NUMAlink hubs and put them onto a single chip. SGI could also bring the NUMAlink router onto the ASIC for the first time. Equally important, some of the functions that had been performed by the NUMAlink hub and router using the Xeon 7500 and E7 chips are now done by the Xeon E5s themselves; PCI-Express controllers are one new on-chip function. This is a much simpler set of NUMAlink ASICs. (And you can see now why Intel wants to control the interconnects.)

With the UV 1000 design, there was a node controller in the blade chassis – which the nodes in the chassis shared – and a NUMAlink router at the top of the rack. With the UV 2000, more of the router functionality is contained in that NUMAlink hub/node controller that is on the system board and the node controllers are doubled up for bandwidth. You can scale across two racks of UV 2000 machines without using an external top-of-rack router.

But, says Matzke, if you want to add extra bandwidth across those E5-4600 unisocket blades, you can add NUMAlink 6 routers at the top of the racks, too. This allows customers to dial up the CPU and bandwidth scalability independently of each other with the UV 2000, something you could not do with the UV 1000. The NUMAlink 6 interconnect provides 6.7Gb/sec of bi-sectional bandwidth.

A blade server from the UV2 super

A blade server from the UV2 super (click to enlarge)

The basic node on the UV 2000 has two single-socket servers with a vertical extender card sandwiched between the two stacked motherboards and linking them together with a NUMAlink 6 hub chip. This packaging is similar, in concept, to the "Gemini" blade used in the ICE X Xeon E5-2600 clusters that were previewed last November at SC11 and that started shipping in March of this year. A 10U chassis holds eight half-width nodes, with up to 128 cores and 4TB of memory. A single rack has four of these, for up to 512 cores and 16TB of memory; and a fully loaded UV 2000 has eight racks for a total 2,048 cores and 64TB of global shared memory. If Intel had switched on one more bit in the E5-4600 memory controller, SGI could have pushed the memory up to the full 128TB of memory it is physically possible to put in the 512 nodes in the fully loaded UV 2000 machine. But it didn't, so you can't.

Similar topics

Broader topics


Other stories you might like

  • New audio server Pipewire coming to next version of Ubuntu
    What does that mean? Better latency and a replacement for PulseAudio

    The next release of Ubuntu, version 22.10 and codenamed Kinetic Kudu, will switch audio servers to the relatively new PipeWire.

    Don't panic. As J M Barrie said: "All of this has happened before, and it will all happen again." Fedora switched to PipeWire in version 34, over a year ago now. Users who aren't pro-level creators or editors of sound and music on Ubuntu may not notice the planned change.

    Currently, most editions of Ubuntu use the PulseAudio server, which it adopted in version 8.04 Hardy Heron, the company's second LTS release. (The Ubuntu Studio edition uses JACK instead.) Fedora 8 also switched to PulseAudio. Before PulseAudio became the standard, many distros used ESD, the Enlightened Sound Daemon, which came out of the Enlightenment project, best known for its desktop.

    Continue reading
  • VMware claims 'bare-metal' performance on virtualized GPUs
    Is... is that why Broadcom wants to buy it?

    The future of high-performance computing will be virtualized, VMware's Uday Kurkure has told The Register.

    Kurkure, the lead engineer for VMware's performance engineering team, has spent the past five years working on ways to virtualize machine-learning workloads running on accelerators. Earlier this month his team reported "near or better than bare-metal performance" for Bidirectional Encoder Representations from Transformers (BERT) and Mask R-CNN — two popular machine-learning workloads — running on virtualized GPUs (vGPU) connected using Nvidia's NVLink interconnect.

    NVLink enables compute and memory resources to be shared across up to four GPUs over a high-bandwidth mesh fabric operating at 6.25GB/s per lane compared to PCIe 4.0's 2.5GB/s. The interconnect enabled Kurkure's team to pool 160GB of GPU memory from the Dell PowerEdge system's four 40GB Nvidia A100 SXM GPUs.

    Continue reading
  • Nvidia promises annual updates across CPU, GPU, and DPU lines
    Arm one year, x86 the next, and always faster than a certain chip shop that still can't ship even one standalone GPU

    Computex Nvidia's push deeper into enterprise computing will see its practice of introducing a new GPU architecture every two years brought to its CPUs and data processing units (DPUs, aka SmartNICs).

    Speaking on the company's pre-recorded keynote released to coincide with the Computex exhibition in Taiwan this week, senior vice president for hardware engineering Brian Kelleher spoke of the company's "reputation for unmatched execution on silicon." That's language that needs to be considered in the context of Intel, an Nvidia rival, again delaying a planned entry to the discrete GPU market.

    "We will extend our execution excellence and give each of our chip architectures a two-year rhythm," Kelleher added.

    Continue reading
  • Amazon puts 'creepy' AI cameras in UK delivery vans
    Big Bezos is watching you

    Amazon is reportedly installing AI-powered cameras in delivery vans to keep tabs on its drivers in the UK.

    The technology was first deployed, with numerous errors that reportedly denied drivers' bonuses after malfunctions, in the US. Last year, the internet giant produced a corporate video detailing how the cameras monitor drivers' driving behavior for safety reasons. The same system is now apparently being rolled out to vehicles in the UK. 

    Multiple camera lenses are placed under the front mirror. One is directed at the person behind the wheel, one is facing the road, and two are located on either side to provide a wider view. The cameras are monitored by software built by Netradyne, a computer-vision startup focused on driver safety. This code uses machine-learning algorithms to figure out what's going on in and around the vehicle.

    Continue reading
  • AWS puts latest homebrew ‘Graviton 3’ Arm CPU in production
    Just one instance type for now, but cheaper than third-gen Xeons or EPYCs

    Amazon Web Services has made its latest homebrew CPU, the Graviton3, available to rent in its Elastic Compute Cloud (EC2) infrastructure-as-a-service offering.

    The cloud colossus launched Graviton3 at its late 2021 re:Invent conference, revealing that the 55-billion-transistor device includes 64 cores, runs at 2.6GHz clock speed, can address DDR5 RAM and 300GB/sec max memory bandwidth, and employs 256-bit Scalable Vector Extensions.

    The chips were offered as a tech preview to select customers. And on Monday, AWS made them available to all comers in a single instance type named C7g.

    Continue reading

Biting the hand that feeds IT © 1998–2022