Linux supercomputer maker Penguin Computing is ramping up its use of nVidia Tesla graphics processing units as co-processors for its x64-based Linux clusters.
Back at the SC08 supercomputing show in November, Penguin was showing off what it called a "personal supercomputer," the "Bumble Bee" Niveus HTX workstation, which uses Intel's "Seaburg" chipset and supports the "Harpertown" Xeon DP processors. The Intel side of the workstation has two quad-core E5400 chips running at 2.8 GHz or 3 GHz and from 8 GB to 32 GB of main memory.</p.
The Tesla C1060 co-processors have their own 4 GB of memory and have 720 GPU cores, and they plug into three of the PCI-Express x16 slots in the motherboard. The nVidia Quadro FX370 video card (which can drive two monitors) eats up the remaining slot. The tower case has room for up to six 1 TB SATA or 32 GB SSD disks.
The whole shebang is rated at 4 teraflops and in a full-tilt-boogie configuration that has water-block cooling on the CPUs and GPUs runs around $15,000, according to Charlie Wuischpard, president and CEO at Penguin. That's about $3.75 per gigaflops.
This is not enough oomph for a lot of workloads, so after getting a little bit of experience selling and supporting the Tesla co-processors, Penguin is now plunking them into its Linux clusters. The company has created an entry and midrange setup, all packaged and ready to run, to get customers up and running quickly on the Xeon-Tesla tech.
The first is an Altus 1702 cluster that is comprised of four 1U servers, each with two Opteron-based servers packed into the chassis, side by side. These "twin" servers use single-socket motherboards and employ 2.3 GHz quad-core "Shanghai" Opterons and have 8 GB of main memory each. The configuration then adds four of nVidia's more powerful Tesla S1070 co-processors, which have four GPUs on a single PCI-Express card. A total of four of these Tesla S1070s go into the cluster (one for every others compute server node, since the tiny servers only have on PCI-Express slot each; presumably the other four slots are used for video cards and other peripherals).
The cluster comes preconfigured with Penguin's Scyld ClusterWare 4.0 management software, plus a baby 9U rack configured with a 24-port Gigabit Ethernet switch, power distribution units for the servers, cabling, and a three-year warranty. The whole shebang is rated at 16 teraflops and has a list price of $44,985 (including a one-year license to the software stack and the CentOS variant of Linux). That works out to about $2.81 per gigaflops.
The midrange hybrid cluster basically doubles up this machine in an 18U rack and provides 32 teraflops for under $89,000, or $2.78 per gigaflops. (There's only one Gigabit Ethernet switch in this doubled up system).
Penguin offers zero percent financing on its systems through a partnership with CIT Group, which underwrites leases for the company as well as other tech giants.
Wuischpard says that Penguin has been keeping a low profile since acquiring Scyld a while back and has been sticking to its knitting creating the ClusterWare suite of tools and selling Linux clusters to HPC sites. The company has five machines on the current Top 500 list (but can't say which ones because they are classified). It had its best year ever in terms of sales last year. "We are profitable, and our gross margins and revenues are up as our costs are down," Wuischpard brags. Not a lot of server makers can say that these days.
The mini-clusters that the company has launched are not meant to be production machines for most customers, but rather starter systems for customers who want to see how GPU-style co-processors and the CUDA extended C programming environment that nVidia has created for them can be deployed to goose existing Linux applications. "A lot of work is being done for GPUs at the academic level, but we are getting interest from government and corporate customers," says Wuischpard.
What nVidia really needs is to get C++ and Fortran supported in the CUDA environment. It is a work in progress, although there are some interfaces into the GPUs available from Fortran and C++ applications, apparently. CUDA supports Linux and Windows platforms, and nVidia says it has shipped over 100 million CUDA-compatible GPUs. That's a lot of oomph that can be accessed to crunch numbers as well as display graphics. ®