Supercomputer buyers don't want to spend months building hybrid CPU-GPU clusters. They want to buy them pre-integrated and ready to start flopping within a matter
In the wake of the announcement of the Nvidia Tesla M2090 GPU coprocessor for servers two weeks ago, Marc Hamilton, vice president of high performance computing at Hewlett-Packard, said in his blog, challenged his HPC team to come up with a pre-integrated rack of servers that would deliver at least 10 teraflops of floating point performance and cost under $100,000.
The GPU Starter Kit, which will be launched at the HP Discover customer and partner shindig in Las Vegas next week, didn't need to use the M2090 fanless GPU coprocessor in the servers to hit the feeds and speeds Hamilton laid out. The starter kit has two of the ProLiant SL6500 tray server chassis, and eight of the ProLiant SL390s G7 2U compute nodes that slide into the chassis with room for three GPUs and that HP quietly launched in April.
The server nodes each have two Intel Xeon X5675 processors running at 3.06GHz, and across the eight nodes, that works out to a peak of 1.18 teraflops of double-precision floating point processing power. Each node was equipped with three M2070 fanless CPU coprocessors – these run at 1.15GHz and only have 448 out of the possible 512 cores activated – for a total of 12.36 teraflops of oomph at double-precision. That's a combined 13.54 teraflops in a rack across the CPUs and GPUs.
The GPU Starter Kit will come with Red Hat Enterprise Linux preinstalled on the nodes as well as HP's own Cluster Management Utility and Linux Value Pack extensions for HPC customers. The CUDA development environment and runtime will also be slapped onto the machines, too. The rack comes with one DL380 as a control node and a 36-port InfiniBand switch and a 24-port Ethernet switch. You basically turn it on, hook it up to networks and storage, and start running applications in under a day.
HP could make a much denser and more powerful ceepie-geepie machine if it wanted to. The first step would be to move to the M2090 GPU from Nvidia, which runs at a higher clock speed, has more memory bandwidth, and has all 512 cores on the GPU humming to deliver 665 gigaflops of double-precision math each. That yields just under 16 teraflops for 24 GPU coprocessors.
But HP could do better than this by switching to the 4U version of the ProLiant SL390s tray server, which has eight GPUs per two socket server. (There is plenty of room in the rack to do this). By switching to this bigger tray server and by putting in four SL6500 chassis, yields 31.9 teraflops of GPU performance plus the 1.18 teraflops from eight server nodes for a total of 33.1 teraflops of oomph. It is hard to say what HP might charge for this.
Presumably, the GPU Starter Kit will have a variant like the one outlined above, and it would be reasonable to surmise that it would cost somewhere around $150,000 to $175,000 if the setup outlined by Hamilton costs $100,000. (Nvidia does not provide pricing for the M series of Tesla GPUs, so it is hard to say for sure.) Perhaps equally significantly, there is room in the rack to put another eight of the SL390s G7 nodes in the 4U trays and double up the performance again in the rack to 66.2 teraflops for maybe $300,000 to $350,000.