While supercomputer maker Silicon Graphics was showing off its existing Altix lines of Xeon and Itanium servers at the SC08 supercomputing show in Austin, Texas, this week, the most interesting thing the company touted was not yet a real computer, but a concept system, called Molecule.
The Molecule machine takes a few pages out of IBM's BlueGene massively parallel supercomputer book, and the main one is that for some workloads, where a large number of compute nodes need to be brought to bear to run a simulation, sometimes it makes more sense to have relatively modest processors instead of big fat ones.
IBM built the BlueGene/L super from its embedded PowerPC 440 dual-core processors. SGI's Molecule concept machine would be built from Intel dual-core Atom x64 chips, which are based on 45 nanometer processes and are designed for netbooks and other portable computing devices where long battery life, not computing power, is the limit of usefulness. The chips run at between 800 MHz and 1.67 GHz and implement HyperThreading, so they can deliver up to two virtual threads per core.
With the BlueGene box, IBM controlled not only the chip but also the interface off the chip and out into the system interconnect. Michael Brown, sciences segment manager at SGI who was showing off the Molecule concept box, says that SGI can't really control the interconnect Intel will put on Atom boards. But presumably a fast enough interconnect could be designed to plug multiple Atom boards into a chassis.
The Molecule concept machine puts a dual-core Atom N330, code-named "Diamondville," on a system board that is about the size of a credit card. This particular chip runs at 1.6 GHz and has a thermal design point of about 8 watts. The Atom N330 is not a true dual-core chip, but rather two single-core Atoms side-by-side in a single chip package (it really isn't even a socket) that is mounted to the board. Brown said that the future "Lincroft" iteration of the Atom chip, which will put a DDR2 memory controller on the chip, and thereby eliminate the need for an external chipset since the Molecule boards have no direct attached storage other than main memory, would be an interesting possibility. But Brown made no commitments to SGI actually using this chip.
In any event, the Molecule board had four memory DIMMs soldered directly to the board and linked to the chip, which provided 2 GB of memory capacity. The interconnect is along the side of the board as the memory chips, and would plug into a backplane of some sort that would reach out to external storage and networks, much as blade servers do inside their chassis.
The Molecule design glues two of these Atom boards to a hollow ceramic cartridge that is used to hold the boards in place, to draw heat off the boards, and to channel cooling air that comes in through the bottom of the chassis and is diverted at a 90 degree angle out the back of the chassis. The cartridges interlace to create a bunch of channels, and have fins and baffles inside to direct airflow very precisely. SGI calls this Atom board packaging Kelvin.
Kelvin, lording over the Atoms in the Molecule
The concept machine at the SC08 show was a 3U rack that contained 180 of the Atom boards, for a total of 360 cores. These boards would present 720 virtual threads to a clustered application, and have 720 GB of main memory (using 512 MB DDR2 DIMMs mounted on the board) and a total of 720 GB/sec of memory bandwidth. The important thing to realize, explained Brown, is that if the interconnect was architected correctly, the entire memory inside the chassis could be searched in one second. That memory bandwidth, Brown explained, was up to 15 TB/sec per rack, or about 20 times that of a single-rack cluster these days. This setup would be good for applications where cache memory or out-of-order execution don't help, but massive amounts of threads do help. (Search, computational fluid dynamics, seismic processing, stochastic modeling, and others were mentioned).
The other advantages that the Molecule system might have are low energy use and low cost. The aggregate memory bandwidth in a rack of these machines (that's 10,080 cores with 9.8 TB of memory) would deliver about 7 times the GB per second per watt of a rack of x64 servers in a cluster today, according to Brown. On applications where threads rule, the Molecule would do about 7 times the performance per watt of x64 servers, and on SPEC-style floating point tests, it might even deliver twice the performance per watt. On average, SGI is saying performance per watt should be around 3.5 times that of a rack of x64 servers.
One more thing: It has no moving parts, and that increases reliability. And if storage needs to be added to the Molecule architecture, it will be flash memory.
The Molecule aims to run off-the-shelf HPC applications on top of Linux or Windows. Brown said that SGI was showing off the concept box to solicit input from prospective customers even before it creates an alpha box. If SGI sees enough interest, it could take 12 to 18 months to produce the concept. If the idea is sound, let's hope it doesn't take that long. ®