This article is more than 1 year old
Hadoop Hive stung into action, swarms around SQL
More relational, more useful to humans, we're promised
Hortonworks has unveiled the Stinger Initiative, a project to make Hadoop’s Hive data warehouse friendlier with SQL and faster.
Hortonworks has also unveiled two accompanying Hadoop projects, which it’s submitted to the Apache Software Foundation (ASF) in the hope they become community-supported projects. They are a runtime called Tez and a sign-in and authentication system called Gateway. Both Tez and Gateway are ASF incubator projects. You can read more about them here.
Hadoop services startup Hortonworks said Stinger would “enhance Hive with more SQL and better performance” for what it called “human-time use cases”.
Translated, Stinger should make Hive friendlier and faster to use in data querying and analytics normally undertaken by SQL and relational tools.
Hive, like the rest of the Hadoop architecture, has thrived on crunching batches of data – Hadoop is a open-source implementation of Google’s MapReduce and a NoSQL system.
However, the NoSQL crowds realised they need to make their architectures work better with SQL-like tools used by businesses in the real world.
The standard SQL interface for Hive was HiveQL, but it doesn't match the latest SQL standard - and support for HiveQL is not widespread, so banking your data infrastructure on it is a bit of a gamble. ASF's HiveQL project web page is depricated, and simply points you to the HiveQL programming manual.
According to Hortonworks, Stinger will make Hive “a more suitable tool for the decision support queries people want to perform on Hadoop”.
This means the addition of analytics features such as the OVER clause, support for subqueries in WHERE and aligning Hive’s type system with the standard SQL model.
The plan is to speed up Hive, too. There’s a new executing engine to increase the number of records per second Hive can process, a new columnar file format to provide “a more modern, efficient and high performing” means to store Hive data, and the Tez runtime framework to speed up workload speeds by eliminating unnecessary talks and synchronization barriers and that reads and writes to HDFS.
A preview of Stinger is planned ahead of the Hadoop Summit in Amsterdam in March. ®