Interview In a side room of this year's Strata + Hadoop conference at the ExCel centre in London, Hadoop creator Doug Cutting spoke to The Register about finding proprietary value in the open-source world, and Cloudera's “not entirely commercial” opposition to the Open Data Platform.
Cutting created Hadoop, which was named after his son's toy elephant, the open-source software framework while at Yahoo!, though he joined Cloudera, which provides its own Cloudera Distribution Including Apache Hadoop (CDH), in 2009 – Hadoop, in the interim, having been donated to the Apache Software Foundation.
Despite his commitments both to the open-source technology and to the proprietary version offering company, Cutting has, he says, committed himself to an independence of opinion: “It's something that I've always tried to maintain, to not be too partisan. I try to reflect all the interests in the community.”
“I'm by no means perfect at that,” he acknowledged. “I can't help but have some biases from my perspective, but I do my best not to be a pitch man, but rather to think about this broader community, and I think Cloudera recognises that that's in Cloudera's interest that we have a vibrant diverse community of vendors.”
Cutting has lived up to his name when commenting on certain projects around Hadoop, and none received more ire than the Open Data Platform initiative (ODPi), which he told us “is actually something which isn't part of the open source community; really, it's a separate foundation.”
“It's now part of the Linux Foundation, which is also a consortium, a club of companies, that you pay to join. It's really not this completely community-owned thing like Apache, where the individual developers control the fate, and so I still don't understand what role it's attempting to fill, I see the claims that its trying to build a standard platform, I actually think standardisation isn't called for and isn't productive.”
His criticisms are joined by those of fellow Cloudera bloke Mike Olson, who said last year that “Cloudera’s partner ecosystem includes 1,447 companies [at the time] ... We’re simply not hearing from them that they’re confused about building applications on core Hadoop.”
Cutting said:“I think we're in a time – and we may be for a long while yet – of tremendous innovation, and that innovation works by having lots of options out there and not having them reduced too much. We have vendors already reducing them and saying 'Here's the set of things that we want to push you towards' and we have Apache, which is sort of a free-for-all, and I'm not sure what value there is in something in-between.”
There is, of course, also the ODP's interest in Ambari – an open source management platform which is also a direct competitor to Cloudera's proprietary offering in Cloudera Manager, which Cutting unsurprisingly reckons is “considerably more advanced”.
“I don't think Ambari has a wide range of contributors, nor a wide range of users,” Cutting told The Register. “I think it tends to predominantly be Hortonworks users.”
Management systems are not “amenable to open source” said Cutting, as they are unlike storage systems and computing engines “where we see a lot of different folks sharing this common technology.”
Cloudera's opposition to the ODPi is “a pragmatic thing,” said Cutting, adding: “We have a better solution, we're not searching for an inferior management tool, we've got a fine management tool for our customers which works well as part of our integrated support offering, so it's not something that we would want to standardise on something else.”
“There's commercial aspects to that, I suppose, but it's not entirely commercial.” said Cutting.
Proprietary versus Open Source
The commercial world of what has, irksomely, been dubbed “big data” poses some interesting questions to the more simple narrative which sees the big bad proprietary giants challenged by revolutionary guerilla open-source hackers. The apparent market place of open-source technologies has been well established, reckoned Cutting.
“It depends on the domain,” he told us. “If you're talking about platform technologies that store and process data I think we're seeing open source winning again and again and again. I think on the other hand for services, things that people subscribe to, some service that is implemented by software, people don't necessarily expect that to be open source.”
You don't expect Facebook,you don't expect Google Search, you don't expect Salesforce to be open source software. That's a service that you use, and I think that's more the direction that things are headed. I think in a lot of ways Cloudera Manager is a service that people run on premises to help maange their cluster. It isn't their data, it's rather something that helps them manage their data and manage their services, their open-source software stack.
We've gone from a world in which enterprises licensed implementations of relational databases for their own core technologies to a world in which the core technologies themselves and those implementations are open source, said Cutting.
“People are running them more and more in the cloud, and the services they use to manage them in the cloud is where I think the proprietary value is going to be,” he stated.
On the other hand, however, Cloudera has “seen that just paying for support calls won't sustain a vendor. That's not a viable business model,” said Cutting, before adding: “Hortonworks is obviously attempting that, and we'll see if they can actually achieve profitability on that basis. We've looked at it and we don't believe it can be done at the rates people are willing to pay for support.”
“It's worth having different models tried,” says Cutting, “I don't believe there should just be one vendor in the space, it's healthy to have competing options.”
Vendors need to find what of proprietary value they can provide, said Cutting. That the proprietary value is not the core technology stack is “healthy for users” he added, “in that they can scale the business and not be locked in.”
Hadoops like yellow elephants
Though invented by Cutting, one of Cloudera's largest rivals, Hortonworks, claims to contribute more to Apache Hadoop than its competitors and sells itself on this front. Cutting says he hates “to get sucked into that whole argument about who contributes more,” but after El Reg brought it up said:
There's all sorts of different metrics you can use and come up with all sorts of different answers.
If you look at it ecosystem-wide, you know, Hortonworks and Cloudera are the largest contributors and you can find different metrics to make each shine depending on whether you look at lines of code, or numbers of bugs fixed, or whether you look a twenty projects wide, or three projects wide, and which twenty and which three, you can cook the books one way or another.
The fact is both companies contribute a lot, and that's great.
Companies' ability to provide quality support is somewhat dependent on their ability to contribute features and nix bugs at Apache.
“There are other vendors who are trying to get established in this space who aren't very significant contributors,” and Cutting reckoned that made it “harder for them to provide high-quality support when they don't have as deep a knowledge and ability to impact these technologies.”
As far as Cloudera's commitment, it “tries, and I think succeeds, to have committers on every project we ship and include in CHD. I believe we have committers on the open source projects, and I think that's not true of other vendors. There's a lot of folks who will start shipping a technology that they don't have the in-house expertise to support and so they'll wing it. We've done the best job of only shipping the technology we can support.”
“Us versus them is not how we tend to think about it,” said Cutting, adding that Cloudera's mission statement was about improving things for its customers, which is not too surprising.
“We've been on that path since practically the beginning, and the path hasn't changed much. Our open source strategy has not changed much, not many of these things I'm talking about have changed since 2009 since when we set on this path, and that was several years before Hortonworks was founded,” he said.
“For example, I think MapR was around actually earlier on, and MapR has had its own path which it has stuck to, so you know, I think it's healthy to have multiple vendors with different strategies here and different customers might value different approached better.”
“I think we're on a very long-term transition in technology platforms people are using for data,” said Cutting.
It will be at least another decade before people are predominantly using this new tool-set in place of RDBMSs for their data, suggested Cutting, who added he believed Cloudera would have an IPO long before that, but at the moment Cloudera is “still in a phase where we're going very quickly,” both in revenue and customers terms, although unlike Hortonworks the company remains private and is not required to file public statements on its performance.
“We had this investment from Intel – two years ago now, perhaps – and we got enough cash from that that we don't need to raise any more money,” said Cutting.
It would be surprising if the company did, having picked up around $740m from Intel alone in 2014. “We believe that we don't need to raise more money for… ever, I guess is the point, until we are profitable. We will IPO when we feel like we have a business we can show that is profitable.” ®