HPC Blog Multicore processors drive everything these days from the biggest HPC cluster to the lowliest tablet – even smartphones. While parallel programming has come quite a way, there are still many apps that aren’t well-behaved at all.
They’re the worst kind of guests – acting like they own the whole damned house while paying absolutely no attention to the needs of other residents.
They’ll grab more memory than they need and never let it go. They’ll spawn enough threads to crowd out everyone else; it’s like inviting their deadbeat friends over to watch the Super Bowl at your house and eat your snacks. Operating systems and virtualization mechanisms attempt to control unruly apps, but they don’t have the ability to completely control and prioritize system resources.
Enter exLudus, and what they’re calling the industry’s first micro-virtualization solution, intuitively named MCOpt. What we’re talking about is a suite of software packages that provide dynamic workload containers, workload characterization, and a performance monitoring/management for Linux operating system instances. It works at a node-level, as a layer between the Linux kernel and the applications running on top of it.
With MCOpt, users can dictate the priority of apps and jobs, and the system will automatically adjust core and memory shares to ensure that the priorities are satisfied. It’s not a static fair-share scheduler; it works dynamically to constantly adjust resource shares and job timing so that SLAs are met and the system achieves maximum throughput.
It can do this because it’s monitoring how each job is using cores and memory. It can spot when a job isn’t using all of its allocated memory or core shares (or if it’s trying to use too much of either) and make adjustments on the fly to keep everything running smoothly and according to business priorities.
In our discussion, the exLudus folks talked about the Linux scheduler and how it can unpredictably cause job priorities to change during execution – which isn’t necessarily a bad thing. But it can make it difficult to pinpoint when resource contention is hindering overall performance. It also means that subsequent re-runs of the same set of jobs will result in different contention behaviors.
MCOpt can also save important work from falling victim to the Linux Angel of Death – the Out-Of-Memory Killer. When system RAM is oversubscribed, there’s a risk that the OOM Killer can swoop in (well, it doesn’t really swoop) and kill processes to free up memory.
MCOpt helps in two ways. First, it keeps apps from oversubscribing memory and thus prevents the OOM Killer from coming into play in the first place. Second, it can steer OOM Killer behavior to protect high-priority workloads. With MCOpt, subsequent re-runs of the same set of tasks will behave exactly the same way – meaning predictable application performance even under stress. (System stress, not personal stress.)
This high level of control can really help overall throughput. The company says that their tests show a 20 to 50 per cent increase in total throughput with MCOpt versus a stock Linux. This is a measure that includes the low overhead load of MCOpt, of course. They have some white papers and stuff here, plus free trial versions of their software too.
exLudus also made sure to point out that MCOpt doesn’t require any application or OS modifications – the MCOpt layer is transparent to both. Better yet, MCOpt can work with other cluster management and virtualization suites if needed – sort of as a subcontractor.
exLudus brings an interesting set of capabilities to the Linux workload management table. It’s like taking a previously unmanageable city traffic plan (Boston? The Bay Area?) and adding synchronized lights and a set of maniacally focused traffic managers. It’s definitely worth a look if you’re seeing signs of road rage between competing apps on your Linux systems. ®