For the world's most lauded open source data platform, Hadoop is remarkably difficult to use, so Tuesday brings another company slinging a tool that entices managers and analysts into fiddling with the elephant.
This time it's analytics startup Platfora with the general release of its in-memory business intelligence layer atop Hadoop. Unlike rival BI engines, Platfora lets you interrogate your Hadoop-stored data via a graphical user interface – no need for terminal here, folks*.
Platfora is an "exploratory BI interface in the spirit of Tableau, Spotfire [but] built natively for the [Hadoop] stack. ... the primary interface is definitely a visual way of working with data," Platfora chief and former head of products for EMC Greenplum Ben Werther told The Register.
The company's plan to make Hadoop as easy to query as possible has struck a chord with the venture capital community, who smelled money and pumped $20m into the company in November, 2012.
Its GUI-heavy approach stands out from other methods of interrogating HDFS. Alternate tools designed to make the obtuse platform accessible work either by layering a SQL engine on top of Hadoop (Concurrent, EMC/Greenplum's HAWQ), making do with the worthy-but-clumsy Hive (Intel), or by pulling the data into another more friendly analytics system, such as ParAccel.
Though these systems can be useful – and in the case of Cloudera's query layer Impala or EMC/Greenplum's Hawq, much faster – they lack the ease-of-use features of Platfora, Werther says.
Platforma can also be accessed via SQL-like and JSON-like APIs, but this is not the priority, he said.
The technology also competes with standard BI tools such as Tableau, Qlikview, and Tibco Spotfire. "These are all fine solutions in a traditional SQL world," Werther says. "They claim they want to be Hadoop and work in a Hadoop world, but they don't have any of the architecture necessary to make this a first-class experience."
Platfora integrates directly with Hadoop, so companies do not need to suck the data into another ETL or data warehouse, he explained.
The technology has three layers – the web-based exploratory BI layer, a scale-out columnar-compressed in-memory engine, and the Hadoop data refinery which runs MapReduce jobs across HDFS data.
Platfora works by grabbing samples of data from HDFS to create a catalog that can be accessed via the web GUI. The system can handle delimited data, AVRO JSON, log records, regex-parseable data, and "other formats," Werther said. When users select the particular data they want to analyse, the system will plan a series of MapReduce jobs to spew data into a partitioned, columnar-compressed dimensional data mart – Platfora calls this a "lens" – which runs automatically. When this is done, the resultant blocks of data are pulled into the Platfora nodes and triple-replicated across disks for redundancy, then when a user makes a query the pieces are pulled into memory.
Perhaps the technology most similar to Platfora is SAP HANA, with both companies having the same belief about analytics – if you can, do it from memory. However, SAP is focused on bridging SAP transactional data and keeping all of it in memory, Werther said, while Platfora is more about providing a way to interface with a massive pool of HDFS data and selectively load it into memory.
The company has no special plans for an intermediary storage layer, like flash, Werther said. Pricing is done on a per-node basis, but was not disclosed.
There's a feeling brewing among users and developers that big-data tools cost too much and do too little, probably emanating from the eye-watering salaries needed to support Hadoop-whisperers and the fact that although these people may speak HDFS, they might not be the best at designing queries for it. Platfora's strategy of making money by prettying-up Hadoop is representative of the overall big-data industry, which is waking up to the fact that if HDFS truly is becoming the all-purpose storage format for ingested data, then there's money to be made by designing tools to let more people analyse it. ®
This begs the question as to how easy-to-use a data analysis system needs to be – after all, nothing is more dangerous for an organization than the pointy-haired denizens of the upper floors suddenly being able to query all stored data and develop opinions about what the business should really be doing, right?