Twitter's Mesos brainbox joins data centre OS venture
PHDer whose web-scale idea went big in one year
The academic brainbox who got his data centre operating system adopted as a piece of critical infrastructure at Twitter has now gone into business.
Benjamin Hindman, the co-founder of open-source cluster manager Mesos – which runs at large web properties including Twitter and Airbnb – has joined VC-backed Mesosphere. The startup was founded in 2013 to drive a paying business around the cluster manager he built as a student.
Hindman started Mesos as part of his PHD thesis at UC Berkley in 2009 and now leads the project at Apache – as Apache Mesos, an Apache project since 2010.
Twitter picked up Mesos in 2010, and hired Hindman to turn the project into a production system at the time Twitter was being re-written in Java from Ruby on Rails.
Mesos now runs key services at Twitter, including analytics and ads.
Chris Fry, Twitter’s senior vice president of engineering, has called Mesos "the cornerstone of our elastic compute infrastructure". Fry adds: “It’s how we build all our new services and is critical for Twitter’s continued success as scale.”
Hindman puts things more succinctly. “If Mesos goes down, Twitter goes down,” he told The Reg this week.
Mesos is proving a rapid hit - grabbed for use in production by domestic hotel biz AirBnB and by PayPal, eBay and OpenTable. Google also has its own version of Mesos, called Borg and developed independently, but the gnomes inside Mountain View's server halls haven’t yet released it.
It’s the opposite of MapReduce, when they published a paper that bred Hadoop.
The goal of Mesosphere is to create an ecosystem around the Apache project by making the system easier to implement for ordinary users in the enterprise.
“We want to create a system that will be used by anybody who writes new distributed apps,” Mesosphere co-founder and CEO Florian Lieber told The Reg on Hindman's hiring.
Lieber got $10.5m in June in series A funding led by Andreessen Horowitz, with additional investors Data Collective and Fuel Capital.
The German software engineer had worked at Twitter and it was he who persuaded Hindman to join the firm before himself leaving to implement Mesos at AirBnB.
Lieber’s plan now is to build paid-for tools and plugins for Mesos, in areas like security and management, improving integration with apps running on Mesos.
The business will likely provide services to those hungry for Mesos to implement the system on their own distributed data centres and clustered nodes.
“Now we get to add in the features [to Mesos] we have been talking about for a while,” Hindman told us. Apache released Mesos 0.20.1 this week, with native support for Docker containers. Earlier this month, Mesosphere bought low-latency database OrlyAtomics.
It has become the classic open-source business model, the type pursued by companies like Cloudera and Hortonworks, which have built support and consulting businesses around Hadoop.
Which raises the question: Is Mesos the next Hadoop?
There are parallels, at least in terms of the project’s intent to run data-centre tasks at massive scale.
Hindman describes Mesos as an operating system for the data centre or a kernel for distributed systems. The idea is that you can build and run apps on distributed servers, with Mesos taking care of the complexity. It’s the "software middleman" between the app and the data centre.
The name Mesos hints at this: hailing from the Greek for “middle". For space nerds, the Mesosphere is the layer of Earth's atmosphere above the stratosphere land below the thermosphere. It sits 30-50 miles above the Earth’s surface and – with the stratosphere – is referred to as the Earth’s “middle atmosphere.”
Another feature guaranteed to peak broader interest in Mesos is that the software is multi-language and open-source – written in C++, Python, Ruby, JVM and Go – and it cuts down the lines of code you need to run apps at scale.
What scale? Mesos spans tens of thousands of servers at Twitter.
Mesos works by using features in operating system kernels that handle resource isolation, prioritisation, constraints and reporting. It processes the CPU, memory, file system and other resources.
The Mesos architecture consists of a master, slave and a framework that runs the slaves. The master and slaves exchange information on free resources with the master determining what resources can be allocated to each framework, allocating resources and then launching the tasks.
Frameworks supported by Mesos include - yup - Hadoop, plus Spark, Storm and Cassandra on big data, Chronos for distributed Cron and Docker containers.
Hindman says Mesos beats commercial cluster managers like IBM’s Symphony and Microsoft’s Autopilot for scheduling and grid controllers like Univa.
The reason? It is developer friendly – with that open-source and multi-language aspect – while there’s also an SDK.
The concept is making programming for the data centre simple: you program as if the data centre were one single run time, rather than building for pieces inside like a networking layer or different servers and crunching through head-scratchers like resource allocation and steps to take if your application fails or there’s not enough memory.
“We expose the API in a programmable way. So people could build applications like Chronos rather than take a cluster manager and build a distributed system on top of that,” says Hindman.
Mesos also runs on Google’s Compute Engine and on Amazon Web Services. Apps that run on Mesos include Impala, Jboss, MySQL, Django and Rails, and the whole thing can run on Linux, OpenSolaris and OS X.
Get us working in 146 characters or less
Hindman was hired by Twitter to get Mesos working; Twitter at the time was rewriting itself – going from the hipstertastic Ruby on Rails language to your dad’s Java.
The rewrite was used to speed up performance and break up Twitter into separate apps and services.
Twitter had been built as a massive monolithic app, cruelly named Mono Rail, which made changing the individual internals difficult. The rewrite broke out components such as key value stores and end-user features like SMS alerts to your phone as individual services, making them easier to change and run.
Hired in 2010, Hindman had Mesos in production at Twitter by 2011.
The key use case for Mesos seem to be where you want to break out parts of your applications or data centre as individual services – like Twitter. Componentising monolithic applications makes the sum and the parts easier to develop and update, but more challenging to manage – hence Mesos, with its framework approach.
The irony in all this?
Massively clustered data centre systems weren’t the starting point for Hindman’s PHD work: rather, he'd been investigating bridging parallel chip cores. He switched, he says, because there wasn’t as much application for this idea spanning cores as there was spanning data centre's nodes.
Looks like he could be right. ®