UV 2: RETURN of the 'Big Brain'. This time, it's affordable

Hefty loads bursting out of your box? Try this


Silicon Graphics is betting big on Intel's latest Xeon E5-4600 processor and its own revved up NUMAlink 6 shared memory interconnect, creating a "big brain computer" that can gang up to 4,096 cores into a single system image to run massive Linux workloads and fairly large Windows jobs, too. The new UV 2 is exactly the kind of box, says SGI, that customers with big data warehouse, big database, big data, and traditional HPC workloads have always wanted – and in many cases could never have afforded.

But the shift to new packaging and lower-cost Xeon E5 processors from Itanium and then Xeon E7 chips from Intel have made the shared memory systems from SGI more broadly accessible at just the same time that many workloads seem to be busting out of general-purpose four-socket boxes. This is good news for SGI, which has had its share of financial woes as it chases the capricious and fiercely competitive HPC and hyperscale data center markets.

SGI will also be pleased to note that Intel has not yet got interconnect fabrics woven into its Xeon processors and chipsets, although it is clearly working on that with the acquisition of Cray's family of HPC interconnects back in April, its purchase of the InfiniBand chip and switch business from QLogic in January, and the Ethernet switch chip business Fulcrum Microsystems back in July 2011.

However, SGI still has a good window in which to capitalize on its NUMAlink interconnect before Intel does whatever it's going to do to integrate interconnects with its CPUs and chipsets. It would not be surprising to see SGI sell the NUMAlink biz to Intel for a big chunk of change, or maybe even an acquisitive Advanced Micro Devices or Hewlett-Packard. In fact, it would not be surprising at all if HP just upped and bought SGI to get out of its Itanium conundrum with Oracle. But so far, SGI seems content to go it alone and to peddle rack and shared memory systems all by its lonesome.

A rack's worth of SGI's UV 2000 supercomputer

A rack's worth of SGI's UV 2000 supercomputer

SGI put out a bit of a preview on the UV 2 lineup when Intel launched the Xeon E5-4600 processors a little more than a month ago. At the time, the company said that it was switching away from the Xeon 7500 and E7 and their multiple QuickPath Interconnect (QPI) ports. SGI had also said it was moving away from the "Boxboro" 7500 chipset that it had used to interface with the NUMAlink 5 interconnect for lashing nodes tightly together in a memory-coherent fashion. The UV 1000 high-end machines were based on a two-socket blade.

The Xeon 7500 and E7 chips have four QPI ports coming off each socket, and the original UV 1000 design used two QPI ports on the Xeon 75000 or E7 chips to cross-link the two sockets together, with one of the remaining two QPI ports going to the Boxboro chipset (which controls access to main memory and local I/O slots on the blade) and the other that links out to the NUMAlink 5 hub, which in turn has four links out to the NUMAlink 5 router. That router implements an 8x8 (paired node) 2D torus that can deliver up to 16TB of shared space across those 256 sockets.

While SGI let it be known a month ago that it was ditching the Xeon E7s for the E5-4600s in the next-generation UV 2000 shared memory supers, the company did not say exactly how it was going to build these machines. (SGI had to save a little something to talk about at the International Super Computing conference in Hamburg, Germany this week, after all.) El Reg speculated that there would be a goosed interconnect and that SGI would stick to two-socket blades. We were right on the first count, but because there are two fewer QPI ports on the Xeon E5-4600 than on the Xeon 7500 and E7, the bandwidth between the ports would have been significantly diminished. It was easier and cleaner to make what is in effect a microserver and use the QPI ports to double up out to the new NUMAlink 6 interconnect hub, and that is what SGI has done.

SGI would have no doubt preferred to build the original UV 1000 machines, which debuted in November 2009 and which spanned 128 blades and 256 sockets in a shared memory configuration, using cheaper Xeon 5500 and 5600 processors. But these chips have only one QPI port coming off their sockets and their on-chip memory controllers cannot address as much memory as the Xeon 7500s and E7s, so SGI had no choice but to use the fat Xeons in 2009 and await the less expensive E5-4600s here in 2012.

The memory expansion on the E5-4600 chip is the key to the rejiggered UV 2000 machine, since each processor socket can currently hold a dozen memory slots and address up to 384GB of memory without any external memory buffers or funky chipsets. But the real secret sauce in the UV 2000 is the NUMAlink 6 interconnect, which is a substantial re-engineering of the NUMAlink 5 interconnect that offers about 2.5 times the bandwidth and a much simpler system design as well.

Jill Matzke, director of server marketing at SGI, says that with the NUMAlink 6, a bunch of different things happened all at once. First, SGI's chip fab partner, Avago Technologies, did a process shrink, allowing for more stuff to be crammed onto the chip. (Avago, which is a spinout of Agilent Technologies, itself a spinout from Hewlett-Packard, doesn’t actually make the NUMAlink chips; a fab in Taiwan does.) So SGI could take two of the NUMAlink hubs and put them onto a single chip. SGI could also bring the NUMAlink router onto the ASIC for the first time. Equally important, some of the functions that had been performed by the NUMAlink hub and router using the Xeon 7500 and E7 chips are now done by the Xeon E5s themselves; PCI-Express controllers are one new on-chip function. This is a much simpler set of NUMAlink ASICs. (And you can see now why Intel wants to control the interconnects.)

With the UV 1000 design, there was a node controller in the blade chassis – which the nodes in the chassis shared – and a NUMAlink router at the top of the rack. With the UV 2000, more of the router functionality is contained in that NUMAlink hub/node controller that is on the system board and the node controllers are doubled up for bandwidth. You can scale across two racks of UV 2000 machines without using an external top-of-rack router.

But, says Matzke, if you want to add extra bandwidth across those E5-4600 unisocket blades, you can add NUMAlink 6 routers at the top of the racks, too. This allows customers to dial up the CPU and bandwidth scalability independently of each other with the UV 2000, something you could not do with the UV 1000. The NUMAlink 6 interconnect provides 6.7Gb/sec of bi-sectional bandwidth.

A blade server from the UV2 super

A blade server from the UV2 super (click to enlarge)

The basic node on the UV 2000 has two single-socket servers with a vertical extender card sandwiched between the two stacked motherboards and linking them together with a NUMAlink 6 hub chip. This packaging is similar, in concept, to the "Gemini" blade used in the ICE X Xeon E5-2600 clusters that were previewed last November at SC11 and that started shipping in March of this year. A 10U chassis holds eight half-width nodes, with up to 128 cores and 4TB of memory. A single rack has four of these, for up to 512 cores and 16TB of memory; and a fully loaded UV 2000 has eight racks for a total 2,048 cores and 64TB of global shared memory. If Intel had switched on one more bit in the E5-4600 memory controller, SGI could have pushed the memory up to the full 128TB of memory it is physically possible to put in the 512 nodes in the fully loaded UV 2000 machine. But it didn't, so you can't.

Similar topics

Broader topics


Other stories you might like

  • DigitalOcean sets sail for serverless seas with Functions feature
    Might be something for those who find AWS, Azure, GCP overly complex

    DigitalOcean dipped its toes in the serverless seas Tuesday with the launch of a Functions service it's positioning as a developer-friendly alternative to Amazon Web Services Lambda, Microsoft Azure Functions, and Google Cloud Functions.

    The platform enables developers to deploy blocks or snippets of code without concern for the underlying infrastructure, hence the name serverless. However, according to DigitalOcean Chief Product Officer Gabe Monroy, most serverless platforms are challenging to use and require developers to rewrite their apps for the new architecture. The ultimate goal being to structure, or restructure, an application into bits of code that only run when events occur, without having to provision servers and stand up and leave running a full stack.

    "Competing solutions are not doing a great job at meeting developers where they are with workloads that are already running today," Monroy told The Register.

    Continue reading
  • Patch now: Zoom chat messages can infect PCs, Macs, phones with malware
    Google Project Zero blows lid off bug involving that old chestnut: XML parsing

    Zoom has fixed a security flaw in its video-conferencing software that a miscreant could exploit with chat messages to potentially execute malicious code on a victim's device.

    The bug, tracked as CVE-2022-22787, received a CVSS severity score of 5.9 out of 10, making it a medium-severity vulnerability. It affects Zoom Client for Meetings running on Android, iOS, Linux, macOS and Windows systems before version 5.10.0, and users should download the latest version of the software to protect against this arbitrary remote-code-execution vulnerability.

    The upshot is that someone who can send you chat messages could cause your vulnerable Zoom client app to install malicious code, such as malware and spyware, from an arbitrary server. Exploiting this is a bit involved, so crooks may not jump on it, but you should still update your app.

    Continue reading
  • Google says it would release its photorealistic DALL-E 2 rival – but this AI is too prejudiced for you to use
    It has this weird habit of drawing stereotyped White people, team admit

    DALL·E 2 may have to cede its throne as the most impressive image-generating AI to Google, which has revealed its own text-to-image model called Imagen.

    Like OpenAI's DALL·E 2, Google's system outputs images of stuff based on written prompts from users. Ask it for a vulture flying off with a laptop in its claws and you'll perhaps get just that, all generated on the fly.

    A quick glance at Imagen's website shows off some of the pictures it's created (and Google has carefully curated), such as a blue jay perched on a pile of macarons, a robot couple enjoying wine in front of the Eiffel Tower, or Imagen's own name sprouting from a book. According to the team, "human raters exceedingly prefer Imagen over all other models in both image-text alignment and image fidelity," but they would say that, wouldn't they.

    Continue reading
  • Facebook opens political ad data vaults to researchers
    Facebook builds FORT to protect against onslaught of regulation, investigation

    Meta's ad transparency tools will soon reveal another treasure trove of data: advertiser targeting choices for political, election-related, and social issue spots.

    Meta said it plans to add the targeting data into its Facebook Open Research and Transparency (FORT) environment for academic researchers at the end of May.

    The move comes a day after Meta's reputation as a bad data custodian resurfaced with news of a lawsuit filed in Washington DC against CEO Mark Zuckerberg. Yesterday's filing alleges Zuckerberg built a company culture of mishandling data, leading directly to the Cambridge Analytica scandal. The suit seeks to hold Zuckerberg responsible for the incident, which saw millions of users' data harvested and used to influence the 2020 US presidential election.

    Continue reading
  • Toyota cuts vehicle production over global chip shortage
    Just as Samsung pledges to invest $360b to shore up next-gen industries

    Toyota is to slash global production of motor vehicles due to the semiconductor shortage. The news comes as Samsung pledges to invest about $360 billion over the next five years to bolster chip production, along with other strategic sectors.

    In a statement, Toyota said it has had to lower the production schedule by tens of thousands of units globally from the numbers it provided to suppliers at the beginning of the year.

    "The shortage of semiconductors, spread of COVID-19 and other factors are making it difficult to look ahead, but we will continue to make every effort possible to deliver as many vehicles to our customers at the earliest date," the company said.

    Continue reading

Biting the hand that feeds IT © 1998–2022