Japanese nuke lab erects 200 teraflop super
Heads for 'Venus'
Server maker Fujitsu has announced that the Japan Atomic Energy Agency will be building a 200 teraflops cluster based on Intel's 'Nehalem EP' Xeon 5500 processors and Fujitsu's blade form factor. JAEA is also buying two Sparc-based clusters, foundations for even larger petaflops-scale supers that Fujitsu plans to build using its future 'Venus' eight-core Sparc64-VIII processors.
Today, JAEA relies on two clusters of a more modest variety - one offering 13.1 teraflops of performance, the other 2.4 teraflops. Considering how important nuclear power is to Japan and the amount of computing capacity that the United States, the United Kingdom, and France use when doing nuclear research, these are are relatively puny cluster. But they're not as puny as you might think. Whereas a lot of the nuclear research that the Western nations do involves weapons, Japan just does research on fission and fusion reactors and how to best handle nuclear fuel and waste.
But JAEA is still looking for more power, and it has aspirations of reaching the petaflops level to advance its research for fission and fusion reactors.
The biggest machine that JAEA currently has is an Altix 3700 Bx2 shared memory system from Silicon Graphics. This box uses 2,048 single-core 1.6 GHz Itanium 2 processors and has a mere 13 TB of memory matched up against its 13.1 teraflops of number-crunching power (peak, not sustained).
It is now four years old and looking long in the tooth. In June 2005, when it was installed, the machine ranked 15th on the Top 500 supers list, but it fell off the list in November 2008. The agency also has a 2.4 teraflops cluster of unknown technology that is used specifically to simulate its fast breeder reactor.
But this iron will soon be history. The agency has contracted with Fujitsu to build a parallel Linux super based on its new Primergy BX900 Dynamic Cube blade servers, which were announced in early May. The plan calls for JAEA to install 2,157 blades using the quad-core 2.93 GHz X5570 processors (the fastest 95 watt versions of the Nehalem EPs), for a total of 17,256 cores. The nodes will be linked together using quad data rate (40 Gb/sec) InfiniBand switches, and the resulting cluster will have a peak theoretical performance of 200 teraflops.
This machine - which will be operational in March 2010 - will be used to do nuclear fusion simulations, and JAEA estimates that its simulation code will require a minimum of 100 teraflops to run. JAEA might be installing a Linux-x64 cluster now, but it looks like it's making some bets on future Fujitsu supercomputer nodes and shared memory systems.
JAEA is also installing Sparc Enterprise M9000 machine rated at 1.92 teraflops and using the current quad-core Sparc64-VII processors as a big memory box and is adding a network of FX1 single-core Sparc64-VII server nodes as the test bed for a future petaflops-scale super that JAEA plans to install. That development cluster will have 320 nodes and 1,280 cores and is rated at 12 teraflops of peak performance. The Sparc machines presumably will run Solaris.
All of the machines are managed using Fujitsu's own Parallelnavi cluster and job management software, and they share access to 1.2 petabytes of Fujitsu's Eternus DX80 disk arrays.
When it is operational, the 200 teraflops Xeon-Linux super will be the most powerful machine in Japan. But JAEA is implying that it is going to reach for petaflops, and to do so, it will be using the Sparc architecture, not x64 chips.
As El Reg reported back in May, Japanese server makers NEC and Hitachi have both pulled out of the $1.2bn Next Generation Supercomputer Project that is being sponsored by the Japanese government to create a hybrid scalar and vector supercomputer involving NEC and Hitachi for vector supers and Fujitsu and its eight-core Venus Sparc64-VIII chips for the scalar half. After doing all the design work for the vector half of this Project Keisoku machine - which was intended to scale to 10 petaflops of peak performance - NEC said in May after reporting an $8bn loss in its fiscal 2009 year ended in March that it could not actually manufacture the vector half of the Keisoku system without incurring losses and walked away from the deal along with partner Hitachi.
That leaves Fujitsu and the Rikagaku Kenkyusho (Riken) research lab in Kobe, Japan saying they will build the fastest scalar computer in Japan, presumably using the Sparc64 Venus chips. It was no accident that Fujitsu was touting the Venus design the day before NEC and Hitachi announced they were ditching the project for financial reasons.
It may look like JAEA is going to follow the lead of the Riken lab, based on the development machine it is installing alongside the new Linux-x64 cluster. But in fact, the JAEA Sparc64 machines will be doing some of the software application development groundwork for the Keisoku system, which is expected to be operational in early 2012.
JAEA stopped short of saying that it would eventually replace the Linux-x64 machine with a giant Sparc64 box. This is the supercomputer business, where technology decisions are based on budgets and politics as much (or maybe more) than on technology. No matter what JAEA does, it is clear that it has to port its code off Itanium processors onto something, and you can bet that Silicon Graphics wants to peddle its future 'UltraViolet' Xeon-NUMAflex shared memory machines (the kickers to the Altix) to the nuke lab. But it looks like the political tide has shifted, and Japan is looking for homes for indigenous products, and that means SGI is facing a tough sell. Maybe an impossible one. ®