Platform Computing's Load Sharing Facility was one of the pioneering commercial programs for coordinating workloads running on parallel supercomputers, and it comes as no surprise that Platform wants to take what it knows about managing grids and apply it to managing clouds.
There are some big differences in the software stack and hardware used by HPC grids and compute clouds, but both types of clusters have one thing in common: They need some master program to be in charge and allocate resources to workloads. Scheduling batch jobs on parallel supercomputers is something that LSF has done since it LSF V1.0 was shipped in 1992.
The product was based on the concepts put forth by Songian Zhou in his PhD thesis at the University of California at Berkeley. Zhou is one of the company's three co-founders and has remained its chief executive officer since its founding. Zhou's company has over 2,000 customers worldwide with more than 5 million processors under management in their grids, and he knows a mess - and an opportunity to sell management tools - when he sees one. That is why Platform is today launching a new product line, called Infrastructure Sharing Facility, or ISF for short.
"Making supply meet demand is the core of the grid technology that we have been delivering for years in LSF," explains Zhou. But LSF is a batch workload manager for a static set of compute and storage grid resources, and as such, it is not directly applicable to cloud computing as we know it in the commercial computing arena. Clouds employ server and storage virtualization and automated provision of server and storage slices as end users and their workloads come and go on the cloud. "This is what is really new."
Platform doesn't want to make its own hypervisors or storage virtualization or even do the provisioning of virtual machines or storage. Rather, it is positioning ISF as the master traffic cop on the cloud, linking into and orchestrating whatever tools cloud providers pick to manage their VMs. It can be a mix of things, such as the complete VMware vSphere or XenServer Essentials stacks or point products such as the VM management tools from BladeLogic or PlateSpin or the server provisioning and orchestration tools from IBM, Hewlett-Packard, Platform, or others. Platform doesn't care, any more than it cared what operating system, MPI stack, or network protocol cluster makers employed when they built their big HPC boxes to run technical workloads.
Just like LSF didn't magically turn a cluster of physical servers into a grid, ISF doesn't turn a cluster of virtual servers into a cloud. People still have to build their clouds, just like they had to build their clusters. What ISF does is schedule the provisioning of virtual infrastructure as cloud workloads change, much as LSF automated the scheduling of HPC application runs on a grid of machines.
Zhou says that ISF has been in development for more than three years and leverages the existing Enterprise Grid Orchestrator (EGO) and Virtual Machine Orchestrator (VMO) tools created by Platform. EGO is an earlier grid orchestration tool designed for traditional HPC applications (and a superset of LSF) that was extended to support Java virtual machines and parallel data mining/data warehousing applications, while VMO is a dynamic resource management tool that is itself packaged up as a XenServer appliance and will soon be able to provision VMware ESX Server or Microsoft Hyper-V VMs on a pool of servers.
ISF is itself a collection of C++ and Java code, and it runs on a dedicated server (either on Windows or Linux) that deploys agent software to server nodes and storage arrays so it can hook into whatever provisioning tools companies deploy. ISF just reaches out and talks to whatever tools are there, and it is Platform's intent to be agnostic about operating system, hypervisor, or server type. Getting support for a wide variety of OSes and VMs is going to take time, perhaps a year or two, according to Zhou. But Platform will be hitting the biggest targets first, as software companies always do.
The first beta of the code, which is what Platform is announcing today, supports VMware's ESX Server 3.5 hypervisor for x64 servers, and it is in the process of being certified for the new ESX Server 4.0 hypervisor that is part of the vSphere stack and will soon be certified to support Citrix Systems' XenServer 5.5, which just started shipping last week.
Zhou says that support for Microsoft's Hyper-V is coming within months to ISF and that the company intended to next put in support for Sun Microsystems' Solaris containers and Red Hat's KVM virtual machines. As customers demand it, Platform is perfectly happy to add support for IBM's PowerVM logical partitions on its Power-based iron and Hewlett-Packard's Integrity Virtual Machines for its Itanium-based servers. It is going to take time to get ISF integrated and certified for the large number of server and storage provisioning tools out there on the market.
ISF will ship in the fall. While Platform is not supplying specific pricing information yet for ISF, cloud operators - whether they run public clouds or internal private clouds - are cheapskates. And Platform knows that it can only charge a few hundred dollars per server box for the tool, with volume discounts obviously. "It is intended to be cost effective," says Zhou.
In a separate announcement, Platform has also rolled out a new subset of its HPC software stack called HPC Workgroup Manager aimed at channel partners who want to get into distributing grid software like Dell, Red Hat, and Hewlett-Packard all do, but without having to go through a formal partnership and customization as they have. HPC Workgroup Manager includes Cluster Manager (formerly known as Open Cluster Stack 5), LSF Workgroup Edition, and Platform's own message-passing interface (MPI) parallel computing stack. The software is certified by Intel as being Cluster Ready, which means you can load and go on specific Xeon-based servers. ®