Linux lessons for Hadoop doubters
Before IBM there was Linus
Open ... and Shut While Hadoop is all the rage in the technology media today, it has barely scratched the surface of enterprise adoption. In fact, if anything, we are still only on the first few steps of the Big Data marathon, a race that Hadoop seems set to win despite its many shortcomings.
The big question will be whether the market will keep the Hadoop faith as these shortcomings are resolved. All indications suggest that it will.
As The Wall Street Journal recently highlighted, the pace of adoption of given technologies has accelerated in the past few years. Even so, as The Atlantic's Alex Madrigal points out, it actually takes a long time for new technologies to catch on, even in today's fast-paced environment. And, importantly: "In many cases, more time was spent going from zero to one percent [market] penetration than from one to 50."
In Hadoop Land, we're still in the transition from zero per cent adoption to one per cent adoption.
Part of the reason is Hadoop's own shortcomings. Despite being a big proponent of Hadoop, IBM points to a few specific deficiencies in Hadoop that hold it back, including a lack of performance and scalability, inflexible resource management, and a limitation to a single distributed file system instead of multiple data source support.
IBM, of course, promises to resolve these issues with its proprietary complements to Hadoop, and it is not alone among the relational database vendors in trying to shame Hadoop for being a poor RDBMS. Still, it's not wrong that Hadoop has significant problems.
One of the biggest is that Hadoop is batch oriented in a world increasingly run in real-time. Loggly and Webtrends have both been quick to call out this void, but I'm not an unbiased observer, either. After all, my own company, Nodeable, was established to add real-time capabilities to Hadoop.
So lots of vendors want to fix Hadoop's problems. Meanwhile, customers are buying big into Hadoop.
Mike Olson, chief executive of the biggest standalone Hadoop vendor, Cloudera, in an email to me called the attempt to sully Hadoop's reputation "desperation FUD. He cited Cloudera's traction with customers and partners. He's right, but given how early we are in the Hadoop adoption curve, it's still possible that other alternatives, like Percolator, will claim the Hadoop crown.
Possible, but not very likely.
This isn't, after all, consumer technology, which changes with the wind. Instagram went from zero to 50 million users in a little over a year, but enterprise technology adoption simply doesn't work that way.
Back in 2000 IBM announced that it was going to invest $1bn in advancing the Linux operating system. This was big news for those of us that supported Linux distributions back then, but it came roughly 10 years after Linus Torvalds released the first Linux source code, and it took another 10 years before Linux really came to dominate the industry.
Today we take it for granted that startups, clouds and other new ventures will default to Linux as their operating system, but for years after IBM's investment IT departments still chafed at putting Linux in their data centres.
Once the momentum got rolling behind Linux, though, there really was no going back. Microsoft tried to FUD Linux into submission, but there was simply too much industry adoption of open-source Linux to halt it.
The same seems true of Hadoop today. Yes, it has problems, just as Linux did back in 1991, or even 2001. But Hadoop also has a community around it that took years for Linux to gather. IBM, Oracle, Microsoft, Cloudera, Hortonworks, Yahoo!, Intel, NetApp, Facebook, Cisco, and more are all behind Hadoop in a big way.
And so are customers. Once Hadoop goes into their data centres, IT departments are simply not going to rip and replace Hadoop with the next shiny Big Data object. Not until the industry as a whole shoves them there, because the enterprise hunts in packs, and the "pack" is currently firmly behind Hadoop.
All of which is why I think we're going to see Hadoop produce the next Oracle-sized database company. We're also likely to see such a company emerge from the NoSQL ranks, but Hadoop is a near certain bet right now. Cloudera currently has the lead, but again, we're just starting the marathon, one that will produce cost savings for customers, fat bank balances for vendors, and several big exits for venture capitalists.
Hadoop, in short, is a gift that will keep giving for many years to come. It's not guaranteed, but it's about as close to a guarantee as the tech industry has. ®
Matt Asay is senior vice president of business development at Nodeable, offering systems management for managing and analysing cloud-based data. He was formerly SVP of biz dev at HTML5 start-up Strobe and chief operating officer of Ubuntu commercial operation Canonical. With more than a decade spent in open source, Asay served as Alfresco's general manager for the Americas and vice president of business development, and he helped put Novell on its open source track. Asay is an emeritus board member of the Open Source Initiative (OSI). His column, Open...and Shut, appears three times a week on The Register.