Hortonworks adds Ambari control freak to Hadoopery

Downloads and business are booming for open source data munching

Hortonworks, the Yahoo! spinoff that has emerged as the third supplier of commercial support for the Apache Hadoop big-data muncher, is ramping up its Data Platform to the 1.2 release, which includes better security, enhancements to HCatalog data sharing and HBase database services that ride on top of Hadoop, and integration of the Ambari graphical management tool that was in tech preview when Hortonworks kicked out its 1.0 release last June.

John Kreisa, vice president of marketing at Hortonworks, tells El Reg that the integration of the Ambari Hadoop installation and management tool is the most important new feature being added with Hortonworks Data Platform 1.2. At the moment, Ambari only runs on Red Hat Enterprise Linux 5 and 6 and its CentOS clones, but the Apache sub-project that hangs off Hadoop is working on support for SUSE Linux Enterprise Server 11, and hopes to have it available soon.

In the initial commercial Hadoop releases from Cloudera and MapR Technologies, the system-management tools were closed source and fee-based, or were given away as part of a support contract sold to cover the open source parts of the Apache Hadoop stack. The fact that Ambari is just bundled with the HDP stack and is, like the rest of that stack, completely open source is seen by Hortonworks as being a key differentiator.

With the Ambari tool, Hadoop administrators can set up, configure, and troubleshoot the big data–munching clusters, including graphical representations of MapReduce jobs running on the system and performance metrics to show how jobs are doing and what resources are loading up what nodes in the cluster.

The Ambari tool also has a database that keeps track of the historical performance of jobs and the nodes in the cluster so you can see how these change over time, and also has an "instant insight" health-check button for all of the core elements of the Hadoop stack. There is also a cluster-navigation feature that allows you to jump into the NameNode settings for any particular node in the cluster.

The NameNode, by the way, is the key part of a Hadoop cluster that keeps track of where data is replicated in the system and which knows where to dispatch computing jobs to those data chunks for it to be chewed on. The JobTracker keeps track of what MapReduce jobs are running around the cluster, chewing on data like locusts swarming a crop.

Ambari, which is coded in Java like the core Hadoop MapReduce and Distributed File System code, and which runs on a Tomcat or other Java application server, also has a REST interface that allows you to programmatically go into the Hadoop cluster and initiate commands, as well as integrating with Microsoft System Center, Teradata Viewpoint, or other system-management tools common in the data center.

Screenshot of the Ambari Hadoop management client in HDP 1.2

Screenshot of the Ambari Hadoop management client in HDP 1.2

The Ambari tool will work with the original HDP 1.0 release as well as the HDP 1.1 release that came out in September 2012, and it will also help do cluster upgrades as customers upgrade their HDP releases. With Ambari in place, you can upgrade the Hadoop stack all at once or in piecemeal fashion, module by module. But Kreisa says that the intent is to use Ambari with the HDP 1.2 release and get everything current.

"We are never more than six commits away from the open source trunk, where our competition is a thousand commits away," brags Kreisa.

He was exaggerating the latter number for effect, but stressed that because HDP was a 100 per cent open source distribution of Hadoop, and one that is created by the core Yahoo! team, being current was one of the advantages that it could bring to bear. While Hortonworks got a later start, Kreisa says that the rate of change with the pure open source projects is going to be much higher and will soon blow by the open-core variants of Hadoop from Cloudera and MapR.

The HDP 1.2 release also has an enhanced security model for the Hive data warehousing and ad hoc querying layer for HDFS and the HCatalog metadata service. The latter is a table and storage management service that allows MapReduce, Pig, Hive, and Streaming – a kind of super-scripter for MapReduce jobs – to share data across those modules even though it is stored in different formats.

The security in the HDP 1.2 stack has a "pluggable authentication model" that restricts access to Hive while at the same time allowing multiple concurrent queries to be processed by Hive concurrently. This code also has stability improvements and performance tweaks, and there is a new ODBC driver that comes from Simba Technologies that can suck data out of and pump date it into Hive. (This ODBC driver is available on Cloudera and MapR distributions, by the way.)

The other thing that Hortonworks is eager to talk about aside from HDP 1.2 is how business is going. Back in June 2012, when HDP 1.0 came out, the company had around 75 employees, and in six months it has added 100 more in engineering, sales, marketing, and support roles to help feed its growing business.

While not naming names or giving out specific numbers, Kreisa says that it has had tens of thousands of downloads of the HDP stack and has a couple dozen "very big-name" paying customers running its HDP stack on production clusters.

Teradata and Microsoft have both linked tails and noses with Hortonworks for their Hadoopery. Teradata partnered with Hortonworks as HDP 1.0 was coming out to integrate Hadoop with its Aster Data database, and a couple of these deals have closed already. Microsoft will soon ship its HDInsight Hadoop service, which is in beta testing right now on its Azure cloud.

Hortonworks does not provide a full list price for its Hadoop distro, and interestingly it is charging for its support contracts based on the size of a cluster instead of on a per-node basis like its rivals. A standard support contract (biz hours, five days a week) runs $12,000 per year for a ten-server cluster. ®

Other stories you might like

  • DigitalOcean tries to take sting out of price hike with $4 VM
    Cloud biz says it is reacting to customer mix largely shifting from lone devs to SMBs

    DigitalOcean attempted to lessen the sting of higher prices this week by announcing a cut-rate instance aimed at developers and hobbyists.

    The $4-a-month droplet — what the infrastructure-as-a-service outfit calls its virtual machines — pairs a single virtual CPU with 512 MB of memory, 10 GB of SSD storage, and 500 GB a month in network bandwidth.

    The launch comes as DigitalOcean plans a sweeping price hike across much of its product portfolio, effective July 1. On the low-end, most instances will see pricing increase between $1 and $16 a month, but on the high-end, some products will see increases of as much as $120 in the case of DigitalOceans’ top-tier storage-optimized virtual machines.

    Continue reading
  • GPL legal battle: Vizio told by judge it will have to answer breach-of-contract claims
    Fine-print crucially deemed contractual agreement as well as copyright license in smartTV source-code case

    The Software Freedom Conservancy (SFC) has won a significant legal victory in its ongoing effort to force Vizio to publish the source code of its SmartCast TV software, which is said to contain GPLv2 and LGPLv2.1 copyleft-licensed components.

    SFC sued Vizio, claiming it was in breach of contract by failing to obey the terms of the GPLv2 and LGPLv2.1 licenses that require source code to be made public when certain conditions are met, and sought declaratory relief on behalf of Vizio TV owners. SFC wanted its breach-of-contract arguments to be heard by the Orange County Superior Court in California, though Vizio kicked the matter up to the district court level in central California where it hoped to avoid the contract issue and defend its corner using just federal copyright law.

    On Friday, Federal District Judge Josephine Staton sided with SFC and granted its motion to send its lawsuit back to superior court. To do so, Judge Staton had to decide whether or not the federal Copyright Act preempted the SFC's breach-of-contract allegations; in the end, she decided it didn't.

    Continue reading
  • US brings first-of-its-kind criminal charges of Bitcoin-based sanctions-busting
    Citizen allegedly moved $10m-plus in BTC into banned nation

    US prosecutors have accused an American citizen of illegally funneling more than $10 million in Bitcoin into an economically sanctioned country.

    It's said the resulting criminal charges of sanctions busting through the use of cryptocurrency are the first of their kind to be brought in the US.

    Under the United States' International Emergency Economic Powers Act (IEEA), it is illegal for a citizen or institution within the US to transfer funds, directly or indirectly, to a sanctioned country, such as Iran, Cuba, North Korea, or Russia. If there is evidence the IEEA was willfully violated, a criminal case should follow. If an individual or financial exchange was unwittingly involved in evading sanctions, they may be subject to civil action. 

    Continue reading
  • Meta hires network chip guru from Intel: What does this mean for future silicon?
    Why be a customer when you can develop your own custom semiconductors

    Analysis Here's something that should raise eyebrows in the datacenter world: Facebook parent company Meta has hired a veteran networking chip engineer from Intel to lead silicon design efforts in the internet giant's infrastructure hardware engineering group.

    Jon Dama started as director of silicon in May for Meta's infrastructure hardware group, a role that has him "responsible for several design teams innovating the datacenter for scale," according to his LinkedIn profile. In a blurb, Dama indicated that a team is already in place at Meta, and he hopes to "scale the next several doublings of data processing" with them.

    Though we couldn't confirm it, we think it's likely that Dama is reporting to Alexis Bjorlin, Meta's vice president of infrastructure hardware who previously worked with Dama when she was general manager of Intel's Connectivity group before serving a two-year stint at Broadcom.

    Continue reading
  • Lithium production needs investment to keep pace with battery demand
    Report says $42b will need to be poured into industry over next decade

    Growing demand for lithium for batteries means the sector will need $42 billion of investment to meet the anticipated level of orders by the end of the decade, according to a report.

    Lithium is used in batteries that power smartphones and laptops, but there is also rising use in electric vehicles which is putting additional pressure on supplies.

    The report, Benchmark Mineral Intelligence, predicts that demand will reach 2.4 million tons of lithium carbonate equivalent by 2030, roughly four times the 600,000 tons of lithium forecast to be produced this year.

    Continue reading

Biting the hand that feeds IT © 1998–2022