Open ... and Shut In the Big Data market, Hadoop is clearly the team to beat. What is less clear is which of the Hadoop vendors will claim the spoils of that victory.
Because open source tends to be winner-take-all, we are almost certainly going to see a "Red Hat" of Hadoop, with the second place vendor left to clean up the crumbs.
As ever with open source, this means the Hadoop market ultimately comes down to a race for community support because, as Redmonk analyst Stephen O'Grady argues, the biggest community wins.
In community and other areas, Linux is a great analogue for Hadoop. I've suggested recently that Hadoop market observers could learn a lot from the indomitable rise of Linux, including from how it overcame technical shortcomings over time through communal development. But perhaps a more fundamental observation is that, as with Linux, there's no room for two major Hadoop vendors.
Yes, there will be truckloads of cash earned by EMC, IBM and others who use Hadoop as a complement to drive the sale of proprietary hardware and software, just as we have in the Linux market with IBM, Oracle, Hewlett-Packard and others.
But for those companies aspiring to be the Red Hat of Hadoop - that primary committer of code and provider of associated support services - there's only room for one such company, and it's Cloudera or Hortonworks. I don't feel MapR has the ability to move Hadoop development, given that it doesn't employ key Hadoop developers as Cloudera and Hortonworks do, so it has no chance of being a dominant Hadoop vendor.
Cloudera and Hortonworks recognise this, which is why both have raised mountains of cash. The size of the Big Data pie is huge, but it's not going to be split evenly. Only one company gets to be the center of the Hadoop ecosystem. Not two.
In enterprise Linux, that "one company" is Red Hat. SUSE (then Novell then just SUSE again) initially took Red Hat on and had a real chance to be the leader, but Red Hat persevered and became the billion-dollar open-source company while SUSE-Novell-SUSE did not.
Why did Red Hat win? Community.
No, not the kind of community we sometimes associate with open source, ie, individual hackers staying up late for the love of coding, though that demographic matters. Red Hat contributes more to the Linux kernel than any single individual or company.
This, in turn, led Red Hat to attract the second type of community: the "professional developer," or third-party application developer. Red Hat managed to amass an unassailable third-party application ecosystem lead. Ultimately, in the Hadoop battle the community to be won is this community of developers building around the Hadoop ecosystem, because it's this ecosystem that leads to customer adoption, which fuels revenues which fuel the hiring of more code committers.
Call it the virtuous cycle of commercial open-source community development.
From 2002 until 2005, I worked at Novell and after the SUSE acquisition saw first-hand how Red Hat used its third-party application ecosystem to crush SUSE. SUSE was always second choice with customers because the applications they wanted ran on Red Hat first, which in turn made SUSE second-best with partners, too. By the time Novell/SUSE finally caught up in terms of sheer number of applications (and now exceeds Red Hat), Red Hat had already cemented its brand and Novell's Linux business languished.
As Linux Foundation executive director Jim Zemlin is fond of saying: "In a world of tissue when you're Kleenex, you've won." When Red Hat became "Kleenex," the game was over.
In the Hadoop world, the race to be "Kleenex" is on, and it involves attracting the biggest ISV community. Between the two dominant Hadoop distributions, it's still a somewhat even race, even if Cloudera took the early lead with customer traction. Hortonworks has been playing up its open source purity, arguing that it's "true" open source while Cloudera offers a freemium/open core model. It's very similar to the argument that Red Hat used to use against Novell/SUSE.
But in this case, I don't think it applies.
Both Cloudera and Hortonworks contribute to and distribute 100 per cent open-source Hadoop platforms. The difference comes from the management and other tools each offers alongside Hadoop. Hortonworks believes even this area should be open source, which is why its rival to Cloudera Manager is open-source Ambari.
The problem, however, is that Ambari isn't as mature as Cloudera Manager. In these early days of Hadoop adoption, customers and partners will skew toward the solution that works best, and that's currently Cloudera. For years, Red Hat's Network product and associated technology weren't open source, and no one cared. What they wanted was to be as productive as possible, as fast as possible.
Still, both companies are in a land grab for quality partners. Unlike in Linux land, there's not One Partner to Rule Them All, as Oracle was for Red Hat. Hortonworks has grabbed Microsoft and Informatica as partners (among others), while Cloudera has IBM and Oracle (among others). In terms of volume of partners, Cloudera has the lead with more than 300 partners (compared to Hortonwork's 62). Of course, Cloudera only lists 51 partners on its website, which suggests that maybe Hortonworks has more partners, too, but hasn't listed them.
Advantage: Cloudera (probably).
But let's get back to fundamentals. Who employs the most core committers to Hadoop, Cloudera or Hortonworks? After all, this tends to be the metric that helps fuel traction with third-party application developers. Unfortunately, there's not a clean answer. By one measure, Cloudera has a slight lead over Hortonworks:
But by Cloudera's own admission, there are multiple ways to measure the two companies' code contributions to Hadoop. Both companies employ several of Hadoop's heavy hitters.
In short, it's too soon to call a winner. Cloudera has a two-year head start and significantly more revenue and general interest, at least as measured by Google searches. But the ultimate prize is reserved for the company that can amass the most meaningful application partners given that Hadoop, like Linux before it, is a platform play.
The platform with the biggest community wins. Every time. Who that winner will be in the case of Hadoop is still not clear. ®
Matt Asay is senior vice president of business development at Nodeable, offering systems management for managing and analysing cloud-based data. He was formerly SVP of biz dev at HTML5 start-up Strobe and chief operating officer of Ubuntu commercial operation Canonical. With more than a decade spent in open source, Asay served as Alfresco's general manager for the Americas and vice president of business development, and he helped put Novell on its open source track. Asay is an emeritus board member of the Open Source Initiative (OSI). His column, Open...and Shut, appears three times a week on The Register.
Mike Olson is on the board of Directors of Nodeable & CEO of Cloudera.