Univa, the upstart HPC software company that forked Oracle's Grid Engine this January, has grafted it onto both the public Amazon EC2 cloud and onto private clouds based on the open source Eucalyptus framework that clones EC2.
Why on earth would you need to mix grid software, which harvests compute cycles from clusters of machines for high performance computing workloads, with clouds, which virtualize server instances and let you create and destroy them with ease? Because sometimes it costs less to be inefficient on the compute side and more efficient on the human side of running a cluster.
Long before there were virtualized clouds of compute and storage pools, gridding software such as Grid Engine was created to be a workload-management and job-scheduling layer on top of a cluster, generally to do massively parallel calculations on cheap server clusters or harvested cycles on PCs. This is all well and good, but provisioning the servers in a cluster to run programs such as Grid Engine is still a pain in the neck.
That's why Univa created its own homegrown provisioning tools for Sun/Oracle's Grid Engine or its own Univa Grid Engine 8.0, the fork off the Grid Engine project. But even if you use Univa's provisioning tools, they only Grid Engine down on bare-metal servers and the underlying cluster itself – really its software stack – is not dynamic. It can't be rolled from one machine to another to, for example, get a performance boost from faster iron, nor can it be seamlessly burst out to public clouds such as Amazon's EC2 if the internal grid doesn't have enough oomph to do the crunching in the allotted time.
Perhaps more importantly, with grid software running on cloudy server infrastructure, you can use the cloud fabric as a workload manager – keeping workloads isolated while the run so they don't interfere with each other – much as a hypervisor and its virtual-machine containers have become a de facto workload manager/job scheduler for Windows and Linux operating systems
All of these reasons are why customers using Grid Engine approached Univa to skyhook the grid software onto cloudy infrastructure. As Gary Tyreman, president and CEO at Unix, tells El Reg: "There are a lot of organizations that are trying to bring Eucalyptus and Grid Engine together in their virtual computing environment. That's why we are bullish about putting hypervisors under grids. But to be honest, virtualization is not a technology that a lot of HPC shops are overly familiar with."
This runs counter to the idea that HPC shops don't like to use virtualization because of the performance penalties it imposes for compute and, more importantly, network and disk I/O. Tyreman says that CPU overhead for virtualization was the bottleneck a few years back, but recent generations of Intel Xeon and AMD Opteron processors have integrated virtualization features that can minimize the CPU overhead to near nothing.
Univa conducted serial-workload tests running Grid Engine atop a cloud based on Oracle's Xen hypervisor – think electronic-design grids or life-sciences grids where there's not a lot of multithreading in the application and not a lot of communication across the server node – and found that the CPU overhead averages somewhere around 2 per cent on a cluster based on modern Xeon 5500 or 5600 processors.
That's no big deal – and for those kinds of workloads, a cloud will help make the grid cluster easier to manage. In some cases, the Xen scheduler inside the hypervisor is actually better than the scheduler inside of Red Hat Enterprise Linux for a particular workload, and putting it on a cloud boosts performance by 2 to 4 per cent.
The marriage of grids and clouds is not yet for everyone – at least not yet. Tyreman says that on parallel HPC workloads, where you are using the message-passing interface (MPI) protocol to move data around the cluster as part of a simulation, the performance degradation of using virtualized server instances over bare-metal servers running Grid Engine can be on the order of 30 to 50 per cent. "The network I/O is what is so punishing," says Tyreman. "There are just so many layers of software."
The good news is that Univa can dispatch such parallel jobs to bare-metal Grid Engine machines. And as soon as I/O virtualization improves in the processor and chipsets, that overhead will be greatly reduced as well. (That's the plan at Intel and AMD, at least.)
The linkage between the Amazon EC2, Cloud.com, and Rackspace Cloud public clouds and any internal Eucalyptus clouds is done through a piece of software called UniCloud, which was developed by Univa to help a customer set up a cloud on EC2 running Grid Engine.
UniCloud supports the deployment of Grid Engine inside of Xen or VMware ESXi containers in whatever format the cloud framework supports – Amazon Machine Images (AMIs) on EC2, and so forth. Tyreman says that Univa has not added support for KVM yet because although "it is good and it is fast, it is not yet enterprise-ready".
Univa Grid Engine costs $99 per core per year if you want to run it on your internal bare-metal or virtualized cluster. The UniCloud add-on brings the price of the base Grid Engine 8.0 license up to $150 per core per year. To burst Grid Engine out to Amazon's EC2, you have to buy the EC2 instances and then pay a 2-cents per hour premium on top of the Amazon price. (That pricing is for a small instance; obviously it costs more on a larger EC2 instance.) Univa gives volume pricing as well for both internal and public cloud installations. ®