This article is more than 1 year old
IBM rejigs Platform control freakery for supers
Hadoop MapReduce clone, new HPC stacks ship this month
Big Blue wants everything to be about Smarter Planet, but its acquisition of grid computing pioneer Platform Computing back in October 2011 was about giving IBM a place in the cloudy, gridded, and automated cluster management space and some necessary tools to maintain some sort of control – both literally and financially – in the modern data center.
It has been eight months since IBM did the Platform Computing deal for an undisclosed sum, and it was done in the wake of some interesting product development at the smaller Canadian software company. IBM is itself a very large Canadian software company, with its database and development tool labs located in Toronto.
As soon as IBM bought Platform Computing, not much was heard from it, and that is because it has taken IBM time to figure out how to best package up the smaller firm's various grid management, cloud fabric, and workload scheduling programs. Platform Computing is now a unit of IBM's Systems and Technology Group, which makes and sells chips, servers, storage, and networking; the Platform tools sit next to operating systems and high availability clustering tools that are sold on IBM systems.
You might be wondering what took so long. When IBM bought Platform Computing, explains Jay Muelhoefer, global marketing leader for the Platform unit, the control freak peddler had over 800 SKUs in its product catalog, aimed at specific use cases, industries, and partnerships. While this gave Platform Computing lots of different things to pitch to customers, this was not the best way to push pump revenues out of a direct sales force or a large reseller channel, as IBM has.
It has taken some time to streamline the products and retool the sales force to go after big deals and assist IBM's channel in chasing many smaller deals. With that work done, and IBM tucking Platform underneath Systems and Technology Group (instead of inside of Software Group as expected), IBM is now re-launching the flagship Load Sharing Facility (LSF) grid workload scheduler and Symphony Java messaging grid software, and is previewing updates that are due shortly for its Platform HPC and Platform Cluster Manager tools.
In case you don't know, Platform Computing was founded in 1992 and its workload scheduler is derived from the work done by Songnian Zhou in his PhD thesis at the University of California at Berkeley in 1987, as well as with the independent work on load sharing in distributed systems from Platform co-founder Jingwen Wang.
The company shipped LSF 1.0 in 1992, and it was quickly adopted to scheduler jobs on big clusters at Nortel, Pratt & Whitney, and the Los Alamos National Laboratory. Two years later, chip makers Advanced Micro Devices and Texas Instruments added LSF to their electronic design clusters.
HP partnered with Platform Computing in 1995, Silicon Graphics in 1998, and IBM and Compaq in 2001. Dell and NEC jumped on board in 2002, Microsoft and Oracle a year later, Red Hat partnered with the company in 2007, and Fujitsu got around to it in June of last year.
Everybody used to help peddle platform tools, and for their own revenue streams. Now, IBM wants that business for itself and for its channel partners and wants Platform's software to give it differentiation in an IT world that will be dominated in many ways by pools of distributed computing that will need some orchestration.
LSF is basically a version of Tetris that a cluster plays on itself, matching distributed number-crunching work with available resource. The Platform Symphony Java messaging grid is similarly designed to allow distributed Java workloads to be spread over a grid of servers and executed in parallel and very quickly, which is important for the financial services customers who use Symphony, by scheduling the appropriate work to be move to an available spot on the cluster.
As part of the re-organization of Platform products, IBM is creating developer, express, standard, and advanced editions of the various Platform grid and management tools, which will have increasing numbers of features and higher prices as you move up the editions. This is consistent with how IBM sells its operating systems, databases, middleware, and other systems software.
With LSF V8.3, IBM is creating two editions of the classic workload scheduling tool. The Standard Edition is what Platform has been selling all along, which is designed for throughput workloads (something on the order of 6,000 nodes, 60,000 or so cores, and 60,000 or so concurrent jobs in the queue) or for parallel HPC workloads (maybe 10,000 nodes, 100,000 cores, and 10,000 concurrent jobs in the queue).
This is overkill for some of the smaller shops that are setting up clusters, particularly in life sciences and other medical-related fields where researchers are not Linux and clustering experts and have no desire to be. To reach these customers, and give the IBM channel something to push to the volume customer, IBM has created LSF Express Edition, which has a less sophisticated workload manager and which is targeted at 100 nodes or less.
The way IBM is carving up the LSF stack is as follows. You get the Standard or Express Editions, and then you add other modules to them. The new LSF Standard Edition replaces the prior LSF, LSF Client, SLF MultiCluster, and Make modules that Platform peddled when it was independent. IBM doesn't say so, but the Express Edition is probably based on the old LSF Workgroup version.
The add-on modules for LSF include Process Manager (workflow automation), License Scheduler (software license management), Analytics (a visual tool for seeing what is going on inside a cluster), and RTM (short for report, track, and monitor and including dashboards and reporting functions). Two other modules in development for release in December as add-ons to LSF include Dynamic Cluster and Session Scheduler; the Application Center Web-based portal for interfacing with LSF is also a separate product, and it includes pre-baked templates for popular HPC applications.
The new LSF V8.3 stack is certified to run on and control various x86 machinery as well as on IBM's own Power Systems servers running either AIX or Linux.
IBM is also rolling out a new V8.3 update to Platform's messaging passing interface (MPI) stack for HPC clusters. The new release has a bunch of nips and tucks to boost the performance of applications running on parallel supercomputer clusters, and it is not tied in any way with the LSF scheduler.
It works fine with Altair's PBS Pro, Grid Engine (in its many variants), the open source SLURM scheduler, or other schedulers and resource managers. Platform MPI V8.3 supports Ethernet, InfiniBand, and Myrinet interconnects on clusters, and supports remote direct memory access (RDMA) acceleration on InfiniBand and Ethernet as well as the new GPUDirect RDMA extensions that Nvidia has added with its latest Tesla K10 and K20 GPUs.
The MPI stack runs on server nodes equipped with Red Hat Enterprise Linux 4, 5, and 6; SUSE Linux Enterprise Server 10 and 11; CentOS 5, or Windows Server 2003 or 2008 (including the HPC Edition). GNU, Intel, PathScale, and Portland Group compilers all know how to speak Platform MPI.
In addition to the reworked LSF and MPI software, IBM is previewing new versions of Platform HPC 3.2, the cluster management bundle that it is aiming at smaller clusters and will bundle with its own System x and BladeCenter servers.
IBM is also cooking up a new release of Platform Cluster Manager, the open source entry cluster management stack that is presumably based on LSF Standard Edition. (IBM has not said.) It also didn't say very much about these forthcoming releases, but apparently there will be a Platform Cluster Manager Advanced Edition aimed at helping companies create multi-tenant HPC "clouds," and by cloud IBM means grid, not virtualized server instances.
If you are having a hard time keep track of what is what in the Platform lineup, here's a cheat sheet:
IBM's Platform Computing HPC editions
Presumably over time, IBM will flesh this grid out, as it has done with its Symphony grid middleware for Java applications. It is not clear what IBM did with Infrastructure Sharing Facility (ISF), which is a variant of LSF that was created explicitly for managing virtual server instances in clouds. IBM's own site makes no mention of it in its search engine.