Argonne taps IBM for 10 petaflops super

More BlueGene/Q details emerge


The US Department of Energy's Argonne National Laboratory announced on Tuesday that it has inked a deal with IBM to build a monster BlueGene supercomputer that will weigh in at 10 petaflops of peak theoretical performance when it is operational around the middle of next year.

El Reg caught wind of the Mira BlueGene/Q massively parallel super going into Argonne back in October, when Cray announced that it had been able to sell an 18,000-core, Opteron-based XE6 super into the Argonne facility even though it has been an IBM stronghold in recent years.

That Cray box weighs in at 150 teraflops of peak performance, and will be utterly dwarfed by the Mira BlueGene/Q machine – unless Argonne hands Cray a bunch of money to expand the Beagle XE6 box. The XE6 architecture can scale to more than 1 million cores and multiple petaflops of performance.

Last November, at the SC10 supercomputing conference in New Orleans, IBMers walked us through the BlueGene/Q prototype, which was on display publicly for the first time and which we compared to the prior BlueGene/L and BlueGene/P machines. IBM was a bit vague about the details of the processors used in the BlueGene/Q system, but provided some feeds and speeds – some of which turns out to be not true.

IBM BlueGene/Q Supercomputer

A rack of IBM BlueGene/Q HPC action

Back at SC10, an IBM software engineer working on BlueGene/Q said that the machine was based on a new Power-derived chip that ran at 1.6GHz. This chip had 16 cores for doing calculations and a 17th core for running a Linux kernel. These were 64-bit cores with four threads per core, just like the cores used in IBM's eight-core Power7 chips for its current generation of Power Systems servers, and the 16-core Power A2 "wire-speed" chip, which is not used in anything yet.

At the time, El Reg speculated that the BlueGene/Q processor would be a modified version of the Power A2 chip, but one running at a lower speed – 1.6GHz versus 2.3GHz – and with an extra core.

On Tuesday, IBM's hardware people cleared up the mystery about the processor at the heart of the BlueGene/Q super, saying that it is nothing funky with 17 cores but rather a 16-core Power A2 processor that is geared down and with special features for fast thread context switching. (This is what happens when you let a software engineer tell you about the hardware, I guess.) The BlueGene/Q machine will put one of these Power A2 chips on a server node, with 8GB or 16GB of main memory per node (512MB to 1GB per core) running at 1.33GHz. That's almost the same exact speed as the processor, which is something that is necessary to get more efficiency out of the BlueGene/Q machine.

The compute nodes have water blocks on the processors, main memory, and optical interconnects, and use water that is between 16 and 25 degrees Celsius (61 to 77 degrees Fahrenheit) to suck the heat out of the nodes. The water cools the optics first, then the compute nodes.

IBM's BlueGene/Q Compute Node

The BlueGene/Q compute drawer

With the BlueGene/Q design, IBM is separating compute nodes from I/O nodes, which will allow compute and I/O capacity to scale independently of each other (which you cannot do in the BlueGene/L and BlueGene/P designs).

The compute drawer has 32 of the single-socket Power A2 modules, and up to 32 of these drawers can be crammed into a rack for a total of 16,384 cores per rack. The I/O nodes go into an I/O drawer; the drawer can hold up to eight I/O nodes based on the same PowerPC A2 boards, as well as up to eight 10 Gigabit Ethernet or InfiniBand adapters for linking to the outside world. The BlueGene/Q design allows from 8 to 128 I/O nodes to be used per rack, with 16 being the default.

The compute nodes run IBM's homegrown open source Linux kernel, with the front-end and service nodes running Red Hat's Enterprise Linux 6. The I/O nodes are also going to run RHEL 6.

IBM Blue Gene Q IO Node

The BlueGene/Q I/O node

Last fall, IBM said that the optical interconnect used in the BlueGene/Q machine would be a 5D torus mesh, and that turns out to be true. I am pretty good at visualizing in 2D and 3D, but 5D gives me trouble. To make this, you do a hypercube linkage of a block of nodes, and then you make a hypercube link of the vertices in the blocks.

This 5D torus/mesh network, says Dave Turek, vice president of deep computing at IBM, has a bandwidth of 40GB/sec and is used for collective operations on the machine. The 5D torus/mesh network also carries global barrier/interrupt traffic. The machine has a Gigabit Ethernet network for boosting individual nodes, debugging, and monitoring, and PCI-Express buses coming out of the I/O nodes to link to storage and the outside world.

Turek says that the Mira machine will essentially be half of the Sequoia BlueGene/Q super that is going into Lawrence Livermore National Laboratory – another DOE lab that took the very first BlueGene/L super – next year. Sequoia will weigh in at 20.13 petaflops of aggregate, raw number-crunching power. Mira will have over 750,000 cores and more than 750TB of main memory to reach its 10-petaflops performance.

IBM has never provided revenue and shipment figures for the BlueGene line of machines, but Turek said that IBM has sold dozens of large-scale machines over the years and hundreds of smaller boxes, adding up to many petaflops of aggregate performance. Turek did not divulge what the Sequoia and Mira machines were sold for, but did say that IBM's internal analysis shows that the BlueGene/Q machine will take about $100m in research and development to bring to market.

While IBM likes to make money, the BlueGene family of machines is more important to IBM as a means of hitting barriers to performance, scalability, and reliability and finding ways around them.

"BlueGene has taken us a step forward in systems design," Turek tells El Reg. "What I have observed is that if you encounter design problems at the peak of supercomputing, then somewhere between four and seven years later you have to deal with this issue in the commercial arena. BlueGene has been terrific at helping us deal with issues of reliability and scale. There has been and continues to be a lot of bantering about this in the industry, but my mantra is simple: you don't know anything until you build it and you have that empirical device in hand."

Argonne has 16 different research programs that are hot to trot to get access to the Mira box, which will be used for materials science, chemistry, nuclear physics, combustion, and energy research, including helping battery manufacturers design better products. Presumably the goal is to build a better battery before the Chinese do – and take over that industry, too. ®


Other stories you might like

  • Employers in denial over success of digital skills training, say exasperated staffers

    Large disparities in views from bosses vs workers on 'talent transformation initiatives,' says survey

    Digital transformation projects are being held back by a lack of skills, according to a new survey, which finds that while many employers believe they are doing well at training up existing staff to meet the requirements, their employees beg to differ.

    Skills shortages are nothing new, but the Talent Transformation Global Impact report from research firm Ipsos on behalf of online learning provider Udacity indicates that although digital transformation initiatives are stalling due to a lack of digital talent, enterprises are becoming increasingly out of touch with what their employees need to fill the skills gap.

    The report is the result of two surveys taking in over 2,000 managers and more than 4,000 employees across the US, UK, France, and Germany. It found that 59 per cent of employers state that not having enough skilled employees is having a major or moderate impact on their business.

    Continue reading
  • Saved by the Bill: What if... Microsoft had killed Windows 95?

    Now this looks like a job for me, 'cos we need a little, controversy... 'Cos it feels so NT, without me

    Former Microsoft veep Brad Silverberg has paid tribute to Bill Gates for saving Windows 95.

    Silverberg posted his comment in a Twitter exchange started by Fast co-founder Allison Barr Allen regarding somebody who'd changed your life. Silverberg responded "Bill Gates" and, in response to a question from Microsoft cybersecurity pro Ashanka Iddya, explained Gates's role in Windows 95's survival.

    Continue reading
  • UK government opens consultation on medic-style register for Brit infosec pros

    Are you competent? Ethical? Welcome to UKCSC's new list

    Frustrated at lack of activity from the "standard setting" UK Cyber Security Council, the government wants to pass new laws making it into the statutory regulator of the UK infosec trade.

    Government plans, quietly announced in a consultation document issued last week, include a formal register of infosec practitioners – meaning security specialists could be struck off or barred from working if they don't meet "competence and ethical requirements."

    The proposed setup sounds very similar to the General Medical Council and its register of doctors allowed to practice medicine in the UK.

    Continue reading
  • Microsoft's do-it-all IDE Visual Studio 2022 came out late last year. How good is it really?

    Top request from devs? A Linux version

    Review Visual Studio goes back a long way. Microsoft always had its own programming languages and tools, beginning with Microsoft Basic in 1975 and Microsoft C 1.0 in 1983.

    The Visual Studio idea came from two main sources. In the early days, Windows applications were coded and compiled using MS-DOS, and there was a MS-DOS IDE called Programmer's Workbench (PWB, first released 1989). The company also came up Visual Basic (VB, first released 1991), which unlike Microsoft C++ had a Windows IDE. Perhaps inspired by VB, Microsoft delivered Visual C++ 1.0 in 1993, replacing the little-used PWB. Visual Studio itself was introduced in 1997, though it was more of a bundle of different Windows development tools initially. The first Visual Studio to integrate C++ and Visual Basic (in .NET guise) development into the same IDE was Visual Studio .NET in 2002, 20 years ago, and this perhaps is the true ancestor of today's IDE.

    A big change in VS 2022, released November, is that it is the first version where the IDE itself runs as a 64-bit process. The advantage is that it has access to more than 4GB memory in the devenv process, this being the shell of the IDE, though of course it is still possible to compile 32-bit applications. The main benefit is for large solutions comprising hundreds of projects. Although a substantial change, it is transparent to developers and from what we can tell, has been a beneficial change.

    Continue reading
  • James Webb Space Telescope has arrived at its new home – an orbit almost a million miles from Earth

    Funnily enough, that's where we want to be right now, too

    The James Webb Space Telescope, the largest and most complex space observatory built by NASA, has reached its final destination: L2, the second Sun-Earth Lagrange point, an orbit located about a million miles away.

    Mission control sent instructions to fire the telescope's thrusters at 1400 EST (1900 UTC) on Monday. The small boost increased its speed by about 3.6 miles per hour to send it to L2, where it will orbit the Sun in line with Earth for the foreseeable future. It takes about 180 days to complete an L2 orbit, Amber Straughn, deputy project scientist for Webb Science Communications at NASA's Goddard Space Flight Center, said during a live briefing.

    "Webb, welcome home!" blurted NASA's Administrator Bill Nelson. "Congratulations to the team for all of their hard work ensuring Webb's safe arrival at L2 today. We're one step closer to uncovering the mysteries of the universe. And I can't wait to see Webb's first new views of the universe this summer."

    Continue reading

Biting the hand that feeds IT © 1998–2022