HPC

Cray bags $30m to upgrade Edinburgh super to petaflops-class

Hector XE6 to be upgraded to Archer XC30


It is not much of a surprise, seeing as how the UK's national supercomputing service at the University of Edinburgh is a long-time Cray customer, that they would return to Cray to replace its existing XE6 system and replace it with a petaflops-class machine based on the latest Cray interconnect and Intel processor technology.

The new machine, to be called "Archer," will be based on Cray's latest XC30 iron, which pairs Intel's Xeon E5 processors with the "Aries" interconnect developed by the supercomputer maker through a contract with the US Defense Advanced Research Projects Agency.

The Engineering and Physical Sciences Research Council is providing the $30m in funding for the Archer system, which includes a production box with nearly three times the performance of the existing XE6 system, which has over 800 teraflops of aggregate number-crunching oomph and is nicknamed "Hector" – actually, they spell that "HECToR", for High-End Computing Terascale Resource, but we're not going to be drawn into such orthographic silliness.

The precise configuration details for the system were not provided by Cray or the University of Edinburgh, most likely because the machine will be based on the as-yet-unannounced "Ivy Bridge-EP" Xeon E5 processors from Intel. Earlier this year, Chipzilla said to expect the Xeon E5 chips to be launched in the third quarter, and early shipments for key cloud and supercomputer customers for the Ivy Bridge variants of these chips began several months ago.

The Hector system is comprised of 30 cabinets, which have a total of 704 of Cray's two-node XE6 blade servers. Each node on the blade has two sixteen-core "Interlagos" Opteron 6276 processors running at 2.3GHz. Each socket on the XE6 blade has 16GB of main memory, and with 2,816 compute nodes, that works out to 90,112 cores and 88TB of main memory.

The XE6 blade has two "Gemini" router interconnect chips, which implement a 3D torus across the nodes and allow it to scale up to multiple petaflops. The Hector machine also has a Lustre clustered file system that scales to over 1PB of capacity.

Schematic of the Archer production and development HPC systems

Schematic of the Archer development and production HPC systems

The XC30 supercomputer, developed under the codename "Cascade" by Cray with funding from DARPA's High Productivity Computing Systems program, started a decade ago. That funding came in two phases, with the initial lump sum of $43.1m being used to outline how Cray would converge various machines based on x86, vector, FPGA, and MTA multithreaded processors into a single platform. (GPU coprocessors had not become a thing yet in 2003 when the initial DARPA award came out.)

Three years later, DARPA gave Cray a $250m research award to further develop the Cascade system and its Aries Dragonfly interconnect as well as work on the Chapel parallel programming language. No one outside of Cray or DARPA knew it at the time, but the second part of the HPCS contract from DARPA originally called for Cray to take its multistreaming vector processors (all but forgotten at this point) and its massively multithreaded ThreadStorm processors (at the heart of the Urika graph analysis appliance) and combine them into a superchip.

But in January 2010, DARPA cut $60m from the Cascade contract and Cray focused on the very fast and expandable Dragonfly interconnect. The whole project cost $233.1m to develop, and now Cray has the right to sell iron based on that technology.

Cray got another $140m in April 2012 when it sold the intellectual property to the Gemini and Aries interconnects to Intel. Cray retains the rights to sell the Aries interconnect and is working with Intel on future interconnects, possibly codenamed "Pisces" and presumably used in the "Shasta" massively parallel systems that Intel and Cray said they are working on together as they announced the Aries interconnect sale.

The important thing as far as the University of Edinburgh is concerned is that the XC30 system has lots and lots of headroom – in fact, a fully loaded XC30 is designed to scale to well over 100 petaflops. But the UK HPC center is not going to be pushing the limits of the XC30 any time soon, with the Archer system having a little more than 2 petaflops of oomph. (Nearly three times the performance of the Hector machine, as the Cray statement explained.)

The plan for Archer, according to the bidding documents, calls for the university to get a test and development system as well as a much larger production system, with both boxes having compute nodes with standard memory but with a subset having fatter memory configurations.

A 56Gb/sec InfiniBand network links out to the Sonexion file systems (which has a total of 4.8PB of capacity and 100GB/sec of bandwidth into the system) and the login servers that sit in front of the production Archer machine. A 10Gb/sec Ethernet network hooks into other local storage and tape archiving, as well as to other network services.

The Archer deal includes the cost of the XC30 systems and the Sonexion storage as well as a multi-year services contract, all worth a combined $30m. (Yes, this kind of bundling makes it very tough to figure out what the system and storage hardware costs individually from services, and this is absolutely intentional.) Archer is expected to be put into production this year.

As we mentioned earlier, the Edinburgh HPC center is a long-time Cray customer, having installed a T3D parallel system in 1994 and added a T3E system in 1996 that peaked out at 309 gigaflops (you read that right) when it was retired in 2002. ®


Other stories you might like

  • If you're Intel, self-driving cars look an awful lot like PCs

    Hardware capabilities, latest feature updates? You'll get what you pay for

    Intel's vision of the computing architecture of autonomous vehicles is similar to that of PCs, with pricey models getting better hardware and the latest software, and cheaper self-driving cars getting the bare minimum.

    The segments of premium and mid-range cars will need extra compute and over-the-air update capabilities to enable increasing levels of autonomous driving, said Erez Dagan, executive vice president at Mobileye, Intel's self-driving car system division, speaking at the Evercore ISI Autotech & AI Forum this week.

    On the other hand, low-end vehicles will have basic equipment, sensors, and features as mandated or incentivized by regulations like the EU's General Safety Regulation, which focuses on improving driver safety.

    Continue reading
  • Researchers finger new APT group, FamousSparrow, for hotel attacks

    Espionage motive mooted in attacks which hit industry, government too

    Researchers at security specialist ESET claim to have found a shiny new advanced persistent threat (APT) group dubbed FamousSparrow - after discovering its custom backdoor, SparrowDoor, on hotels and government systems around the world.

    "FamousSparrow is currently the only user of a custom backdoor that we discovered in the investigation and called SparrowDoor," ESET researcher and co-author of the report Tahseen Bin Taj explained in a prepared statement. "The group also uses two custom versions of Mimikatz. The presence of any of these custom malicious tools could be used to connect incidents to FamousSparrow."

    The group can be traced back to 2019, the researchers claimed, though the attacks tracked in the report made use of the ProxyLogon vulnerability in Microsoft Exchange starting in March this year. Victims were spread around Europe, the Middle East, the Americas, Asia, and Africa - without a single one being discovered in the US, oddly.

    Continue reading
  • Is it a bird? Is it a plane? Nah, it's just Windows suffering from a bit of vertigo

    Up above the streets and houses, XP's flying high

    Bork!Bork!Bork! Windows XP continues to hang in there – quite literally – as the operating system does what it does best some 90 metres above the London's River Thames.

    The screen, spotted by Register reader Andy Jones while safely ensconced within the confines of an Emirates Air Line gondola, appears to be in something of a boot loop. It looks to be endlessly resetting as the UK capital city's cable car attraction grinds itself along the kilometre or so between the Greenwich Peninsula and the Royal Docks.

    Continue reading
  • How many Android containers can you fit on your VM?

    The Register speaks to Canonical about running the OS in the cloud

    Interview Developers targeting Android are spoiled for choice with their platforms.

    There are a variety of options available for running Android application development environments these days. Even Microsoft has promised that its upcoming Windows 11 will eventually be able to run the apps on the desktop and has long since supported the mobile OS via its Your Phone app, even while smothering its ailing Windows Phone with a cuddly Android pillow.

    For Canonical, however, Anbox remains a cloud product, according to Simon Fels, engineering manager and is therefore unlikely to feature in any desktop version of the company's Ubuntu distribution any time soon, although with September's announcement it will now cheerfully scale from the heights of the cloud down to a single Virtual Machine via the Appliance version.

    Continue reading
  • Infosys admits it still hasn't fully fixed Indian tax portal

    Deadline came and went, but over 750 'resources' are still hard at work

    Infosys has admitted it has missed the Indian government's deadline to fix the tax portal it built, but which has been a glitchy mess since its June 2021 launch.

    The portal was introduced to make filing taxes more efficient. It delivered the opposite – India's government was forced to extend filing deadlines amid user complaints that they found the portal impossible to use. The portal was even placed into "emergency maintenance" mode at one point, during which it was completely unavailable.

    Infosys was shamed by ministers and on August 22nd was given a September 15th deadline to fix the portal.

    Continue reading
  • Here's an idea: Verification for computer networks as well as chips and code

    What tools are available? What are the benefits? Let's find out

    Systems Approach In 1984, artificial intelligence was having a moment. There was enough optimism around it to inspire me to explore the role of AI in chip design for my undergraduate thesis, but there were also early signs that the optimism was unjustified.

    The term “AI winter” was coined the same year and came to pass a few years later. But it was my interest in AI that led me to Edinburgh University for my PhD, where my thesis advisor (who worked in the computer science department and took a dim view of the completely separate department of artificial intelligence) encouraged me to focus on the chip design side of my research rather than AI. That turned out to be good advice at least to the extent that I missed the bursting of the AI bubble of the 1980s.

    The outcome of all this was that I studied formal methods for hardware verification at a point in time where hardware description languages (HDLs) were just getting off the ground. These days, HDLs are a central part of chip design and formal verification of chip correctness has been used for about 20 years. I’m pretty sure my PhD had no impact on the industry – these changes were coming anyway.

    Continue reading
  • Imagine a fiber optic cable that can sense it's about to be dug up and send a warning

    Forget wiring cities with IoT devices – this could be how wide-scale sensing gets done

    Imagine an optic fiber that can sense the presence of a nearby jackhammer and warn its owner that it is in danger of being dug up, just in time to tell diggers not to sink another shaft. Next, imagine that an entire city's installed base of fiber could be turned into sensors that will make planners think twice before installing IoT devices.

    Next, stop imagining: the tech is real, already working, and was yesterday used to demonstrate the impact of an earthquake.

    As explained to The Register by Mark Englund, CEO of FiberSense, the company uses techniques derived from sonar to sense vibrations in fiber cables. FiberSense shoots lasers down the cables and observes the backscatter as the long strands of glass react to their environment.

    Continue reading
  • Unable to test every tourist and unable to turn them away, Greece used ML to pick visitors for COVID-19 checks

    Inside the software built to figure out groups of potentially infected, asymptomatic passengers

    Faced with limited resources in a pandemic, Greece turned to machine-learning software to decide which sorts of travelers to test for COVID-19 as they arrived in the country.

    The system in question used reinforcement learning, specifically multi-armed bandit algorithms, to identify which potentially infected, asymptomatic passengers were worth testing and putting into quarantine if necessary. It also was able to produce up-to-date statistics on infections for officials to analyze, such as early signs of the emergence of COVID-19 hot spots abroad, we're told.

    Nicknamed Eva, the software was put to use at all 40 of Greece's entry points from August 6 to November 1 last year. Incoming travelers were asked to fill out a questionnaire detailing the country and region they were coming from as well as their age and gender. Based on these characteristics, Eva selected whether they should be tested for COVID-19 upon arrival. At its peak, Eva was apparently processing between roughly 30,000 and 55,000 forms a day, each form representing a household, and about 10 to 20 per cent of households were tested.

    Continue reading
  • Angry birds ground some Google Wing drones in Australia

    Between COVID and corvids, locked-down Aussies can't catch a break - or a coffee lowered from the treetops

    Some of Google parent company Alphabet's Wing delivery drones have been grounded by angry Australian birds.

    As reported by the Australian Broadcasting Corporation, and filmed by residents of Canberra, ravens have attacked at least one of Wing's drones during a delivery run.

    Canberra, Australia's capital city, is currently in COVID-caused lockdown. It's also coming into spring – a time when local birds become a menace in the leafy city. Magpies are a particular hazard because they swoop passers-by who they deem to be threateningly close to their nests and the eggs they contain. Being swooped is very little fun – magpies dive in, often from a blind spot, snapping their sharp beaks, and can return two or three times on a single run. Swooping is intimidating for walkers, and downright dangerous for cyclists.

    Continue reading
  • Memory prices to dive in late 2022, says Gartner

    Firm says 40 per cent of a server's bill of material costs are tied to memory

    Prices for DRAM and NAND flash are set to fall, sharply, in the second half of 2022 according to analyst firm Gartner.

    In a memo published last week and obtained by The Register, the firm predicts “oversupply” of memory chips will develop as demand eases and supply increases. A “significant price reduction” is therefore likely, the firm states, without offering a more precise estimate of how far prices will fall.

    The memo appears to be is directed at hardware manufacturers and advises them to start designing products that use more memory or keep memory and price the same but add other components – better CPUs, batteries or screens are suggested - to keep overall bill of material costs the same while also making devices more attractive.

    Continue reading
  • AWS announces new region in the Land of the Long White Cloud – New Zealand

    Hopes three availability zones will be hobbit-forming for local businesses and government agencies

    Amazon Web Services has announced it will build a Region in New Zealand and light it up by the year 2024.

    The forthcoming Asia Pacific (Auckland) Region will feature three availability zones - a configuration AWS rarely exceeds.

    The cloud colossus has said it will spend US$5.3 billion in New Zealand over the next 15 years, some of which will be capital expenditure on its new bit barns.

    Continue reading

Biting the hand that feeds IT © 1998–2021