MapR Technologies, one of the commercializers of the Hadoop big data muncher, has pocketed another $30m to help it ramp up its business and keep it on track for what the company hopes will be an initial public offering..
While Cloudera was out of the gate early commercializing the Hadoop big data muncher, MapR was close behind (by a matter of weeks) and no Hadoop distie has yet emerged as the inevitable Red Hat for fat and fast data.
There are plenty of other contenders, all of them doing interesting things to and with Hadoop, including (in no certain order) MapR, the Hortonworks direct spinout from Yahoo!, the spinning-out Pivotal unit of EMC, IBM (which has sold its own BigInsights variant of Hadoop for a few years) and now Intel, which has just announced its own Hadoop distro.
What is amazing is that Yahoo! spun Hortonworks out in the first place instead of leveraging it as a strategic asset, and that software-hungry Hewlett-Packard and Dell have not snapped up Cloudera or MapR to build out their software portfolios.
Every day that passes, these companies get more and more expensive, to the point where both must be tempted to either give up on owning their own distributions or grab the various Apache components and start up one of their own.
With the big data market (which means subscription support for open source components plus licensing for proprietary software extensions and the hardware to run it) expected to reach $5bn in revenues by 2016 or so, there would seem to be plenty of room for multiple contenders. Markets have tended in the past to create a few dominant players, and while MapR wants to be one of them in the big data world.
But with the advent of cloud platform services like Amazon Web Services' Elastic MapReduce, Google's BigQuery, or the eponymous service from Splunk, many companies may simply never install their own big data software. And still others with the technical resources may decide that Hadoop is strategic enough of an infrastructure/application layer that they build their own competence.
And so it is not a foregone conclusion at this point in the big data game that Hadoop will precisely track the history of the Linux operating system or that a dominant player like Red Hat will emerge. The market could remain highly fragmented.
None of the Hadoop disties want to think about that possibility, and they certainly want to be able to leverage what must be some pretty high multiples to either go public or sell out to the tier one IT system suppliers who are desperate to build up their software and services businesses.
"We've got a management team that is not looking for a quick exit," Jack Norris, vice president of TKTK, tells El Reg. "This is a paradigm shift, this is a new architecture. We are focused on an IPO, and John has the Splunk IPO on his desk and he looks at it often. We think we have an even bigger opportunity." Norris was referring to John Schroeder, [co-founder and CEO of MapR.
MapR's equity backers think it has a bigger opportunity than Splunk, too. In the first two rounds of funding from Lightspeed Venture Partners, Redpoint Ventures, and NEA, MapR was able to raise $29m and get several generations of Hadoop distributions into the field. The company, being privately held, does not provide revenue figures or customer counts, but has grown to 150 employees. The company's second round helped MapR open offices in London and Munich as part of its expansion in Europe.
This time around with the $30m in Series C funding, Mayfield Fund is leading the investment (with all three other equity players kicking in more dough), and Norris says the plan is to use it to expand into Asia while at the same time boosting its research and development to extend the MapR Hadoop stack.
The current M7 Hadoop distro marries MapR's innovative file system, which makes the Hadoop Distributed File System (HDFS) look like NFS to applications, with the HBase data warehousing layer for HDFS to significantly speed up SQL-like queries on Hadoop clusters.
That HBase speedup debuted back in October 2012, and it basically pushes HDFS down into its distributed NFS file system, and shards both data chunks and portions of HBase tables and spreads them around the cluster for performance but presents then as unified data and tables for applications.
MapR is very keen on its Apache Drill add-on for Hadoop, which is trying to bring realtime, interactive querying akin to what we have had for relational databases for decades to the Hadoop stack. Just as HBase sort of clones Google's BigTable overlay for its Google File System, Drill mimmicks Google's Dremel query tool, which uses an SQL-alike language called DrQL. Both Drill and the Google BigQuery service support DrQL.
All of the Hadoop disties are, of course, chasing the same dream. Cloudera has its Project Impala layer for HDFS to replace the Hive SQL-alike query language for HBase, and EMC's Pivotal group spinoff announced last week has taken the SQL guts out of the Greenplum parallel database and woven it into HDFS to create Project Hawq, which speaks actual SQL to sort through data stored in HDFS.
MapR is still the only Hadoop distie that can make HDFS speak NFS, but all of the big players are working on something that tries to make HDFS speak SQL, the default query language for relational databases, in one degree or another.
The investment by Mayfield Fund is not a particularly good indicator if MapR will end up being sold or will actually make a debut on Wall Street. The venture capital firm, established in 1969, has invested in over 500 companies. Of these, more than 100 have been sold off in mergers or acquisitions and more than 100 have gone public. ®