Cloudera floats commercial Hadoop distro

Open source for the Google wannabe


Face it: You want to launch your own Google and get your hands on some of that (easy?) internet money. Well, now's your chance to take a stab at it.

Today, a startup called Cloudera is launching a commercial distribution of the Google-inspired open source Hadoop software underpinning Yahoo, Facebook, and a number of other hot-shot web companies.

The Cloudera team includes four founders, all of which bring different things to the Hadoop table: Christophe Bisciglia, who led the partnership between Google, IBM, and the National Science Foundation to create Hadoop grids for academics to play around with; Amr Awadallah, a former Yahoo vice president of engineering that led the data warehousing and analytics effort behind that company's mail, search, finance, and news services; - Mike Olson, formerly the chief executive officer of open source database maker Sleepycat Software (now owned by Oracle); and Jeff Hammerbacher, formerly of social networking giant Facebook and the manager who created the Hive project, which is a data warehousing layer that works in conjunction with Hadoop and that Facebook uses to do data analysis on its many petabytes of information stored in its user data warehouse.

Hammerbacker is an entrepreneur in residence at venture capitalist Accel Partners, and back in October, Accel kicked in $5m in Series A funding for Cloudera. The startup has also tapped Hadoop creators Doug Cutting and Mike Cafarella as advisors as well as Diane Green (founder and former CEO at virtualization specialist VMware) and Marten Mikos (the former CEO of MySQL before Sun Microsystems bought it). These and a handful of other tech luminaries are not just advisors, but investors in Cloudera.

According to Christophe, the Hadoop stack that Cloudera is supporting is based on the latest stable releases of the code that is available through the Apache Foundation, where the open source version of Hadoop lives. This includes Hadoop 0.18.3, which has the Hadoop Distributed File System - as the name suggests, a distributed and fault-tolerant file system - and the MapReduce application parallelization and execution environment that works in conjunction with HDFS.

The Cloudera Hadoop distro will also include the Hive client library associated with Hadoop (and also available through Apache), but according to Christophe, it doesn't really have version numbers yet. The important thing is that Cloudera found a set of Hive code that works with Hadoop 0.18.3 and that Hive includes a query language called HQL, which allows Hadoop data sets to be queried in a manner that is similar to SQL queries against a relational database.

Olson says that Cloudera was founded last summer, and the company is clearly ramping quickly if it has already secured so much financial and technical backing. And the reason is simple: People want to figure out how to use Hadoop in their own IT operations, but it is a pain in the neck to get it all set up and working.

"Adoption of Hadoop has been slow in mainstream computing because it is still hard to install, build, and maintain a Hadoop cluster," explains Olson. "We are convinced that normal companies are going to be coping with terabytes and petabytes of data, and Hadoop is the most interesting thing to come along in a decade for dealing with large data sets. We want to be the Hadoop company that enterprises come to when they want to crunch those big data sets."

One of the things that Cloudera started doing with its beta customers when it started alpha trials of its services last fall was get everyone on the same release of Hadoop and Hive. And standardization means not being on the bleeding edge, by the way. Hadoop 0.19 is out, and according to Christophe, it has many needed features. But "much-needed features have come with bugs." These may be shaken out in Hadoop 0.20, but commercial companies that are basing their business on this software don't want to mess around with code.

The analogy with Linux is plain enough: They want something akin to the hardened and slow-changing Red Hat Enterprise Linux, not the Fedora development release.

Hadoop is written in Java, which means it can run on any Java-enabled platform, but Christophe says that 90 per cent of companies deploy it on a Linux operating system and most deploy it on x64 iron. The Cloudera Hadoop distro is packaged up in Red Hat-style RPMs, and the Hadoop functions are available as Linux services, just like a Web server is, for instance. The Cloudera package, technically known as the Cloudera Distribution for Hadoop, is also available as an Amazon EC2 image. Given that all of the Hadoop code is open source, the Cloudera packages are all available for free thanks to open source Apache 2 licenses governing the code.

Cloudera plans to make money selling consulting, training, and support, just like Linux distros do. Pricing has not been announced yet, and Olson was pretty stubborn about the need to keep pricing private until Cloudera gets some more business under its belt. The current pricing metrics, he did say, were based on the size of the Hadoop cluster, including the number of servers and the size of the data sets. ®

Similar topics


Other stories you might like

  • Despite 'key' partnership with AWS, Meta taps up Microsoft Azure for AI work
    Someone got Zuck'd

    Meta’s AI business unit set up shop in Microsoft Azure this week and announced a strategic partnership it says will advance PyTorch development on the public cloud.

    The deal [PDF] will see Mark Zuckerberg’s umbrella company deploy machine-learning workloads on thousands of Nvidia GPUs running in Azure. While a win for Microsoft, the partnership calls in to question just how strong Meta’s commitment to Amazon Web Services (AWS) really is.

    Back in those long-gone days of December, Meta named AWS as its “key long-term strategic cloud provider." As part of that, Meta promised that if it bought any companies that used AWS, it would continue to support their use of Amazon's cloud, rather than force them off into its own private datacenters. The pact also included a vow to expand Meta’s consumption of Amazon’s cloud-based compute, storage, database, and security services.

    Continue reading
  • Atos pushes out HPC cloud services based on Nimbix tech
    Moore's Law got you down? Throw everything at the problem! Quantum, AI, cloud...

    IT services biz Atos has introduced a suite of cloud-based high-performance computing (HPC) services, based around technology gained from its purchase of cloud provider Nimbix last year.

    The Nimbix Supercomputing Suite is described by Atos as a set of flexible and secure HPC solutions available as a service. It includes access to HPC, AI, and quantum computing resources, according to the services company.

    In addition to the existing Nimbix HPC products, the updated portfolio includes a new federated supercomputing-as-a-service platform and a dedicated bare-metal service based on Atos BullSequana supercomputer hardware.

    Continue reading
  • In record year for vulnerabilities, Microsoft actually had fewer
    Occasional gaping hole and overprivileged users still blight the Beast of Redmond

    Despite a record number of publicly disclosed security flaws in 2021, Microsoft managed to improve its stats, according to research from BeyondTrust.

    Figures from the National Vulnerability Database (NVD) of the US National Institute of Standards and Technology (NIST) show last year broke all records for security vulnerabilities. By December, according to pentester Redscan, 18,439 were recorded. That's an average of more than 50 flaws a day.

    However just 1,212 vulnerabilities were reported in Microsoft products last year, said BeyondTrust, a 5 percent drop on the previous year. In addition, critical vulnerabilities in the software (those with a CVSS score of 9 or more) plunged 47 percent, with the drop in Windows Server specifically down 50 percent. There was bad news for Internet Explorer and Edge vulnerabilities, though: they were up 280 percent on the prior year, with 349 flaws spotted in 2021.

    Continue reading
  • ServiceNow takes aim at procurement pain points
    Purchasing teams are a bit like help desks – always being asked to answer dumb or inappropriate questions

    ServiceNow's efforts to expand into more industries will soon include a Procurement Service Management product.

    This is not a dedicated application – ServiceNow has occasionally flirted with templates for its platform that come very close to being apps. Instead it stays close to the company's core of providing workflows that put the right jobs in the right hands, and make sure they get done. In this case, it will do so by tickling ERP and dedicated procurement applications, using tech ServiceNow acquired along with a company called Gekkobrain in 2021.

    The company believes it can play to its strengths with procurements via a single, centralized buying team.

    Continue reading
  • HPE, Cerebras build AI supercomputer for scientific research
    Wafer madness hits the LRZ in HPE Superdome supercomputer wrapper

    HPE and Cerebras Systems have built a new AI supercomputer in Munich, Germany, pairing a HPE Superdome Flex with the AI accelerator technology from Cerebras for use by the scientific and engineering community.

    The new system, created for the Leibniz Supercomputing Center (LRZ) in Munich, is being deployed to meet the current and expected future compute needs of researchers, including larger deep learning neural network models and the emergence of multi-modal problems that involve multiple data types such as images and speech, according to Laura Schulz, LRZ's head of Strategic Developments and Partnerships.

    "We're seeing an increase in large data volumes coming at us that need more and more processing, and models that are taking months to train, we want to be able to speed that up," Schulz said.

    Continue reading
  • We have bigger targets than beating Oracle, say open source DB pioneers
    Advocates for MySQL and PostgreSQL see broader future for movement they helped create

    MySQL pioneer Peter Zaitsev, an early employee of MySQL AB under the original open source database author Michael "Monty" Widenius, once found it easy to identify the enemy.

    "In the early days of MySQL AB, we were there to get Oracle's ass. Our CEO Mårten Mickos was always telling us how we were going to get out there and replace all those Oracle database installations," Zaitsev told The Register.

    Speaking at Percona Live, the open source database event hosted by the services company Zaitsev founded in 2006 and runs as chief exec, he said that situation had changed since Oracle ended up owning MySQL in 2010. This was as a consequence of its acquisition that year of Sun Microsystems, which had bought MySQL AB just two years earlier.

    Continue reading
  • Beijing needs the ability to 'destroy' Starlink, say Chinese researchers
    Paper authors warn Elon Musk's 2,400 machines could be used offensively

    An egghead at the Beijing Institute of Tracking and Telecommunications, writing in a peer-reviewed domestic journal, has advocated for Chinese military capability to take out Starlink satellites on the grounds of national security.

    According to the South China Morning Post, lead author Ren Yuanzhen and colleagues advocated in Modern Defence Technology not only for China to develop anti-satellite capabilities, but also to have a surveillance system that could monitor and track all satellites in Starlink's constellation.

    "A combination of soft and hard kill methods should be adopted to make some Starlink satellites lose their functions and destroy the constellation's operating system," the Chinese boffins reportedly said, estimating that data transmission speeds of stealth fighter jets and US military drones could increase by a factor of 100 through a Musk machine connection.

    Continue reading
  • How to explain what an API is – and why they matter
    Some of us have used them for decades, some are seeing them for the first time on marketing slides

    Systems Approach Explaining what an API is can be surprisingly difficult.

    It's striking to remember that they have been around for about as long as we've had programming languages, and that while the "API economy" might be a relatively recent term, APIs have been enabling innovation for decades. But how to best describe them to someone for whom application programming interfaces mean little or nothing?

    I like this short video from Martin Casado, embedded below, which starts with the analogy of building cars. In the very early days, car manufacturers were vertically integrated businesses, essentially starting from iron ore and coal to make steel all the way through to producing the parts and then the assembled vehicle. As the business matured and grew in size, car manufacturers were able to buy components built by others, and entire companies could be created around supplying just a single component, such as a spring.

    Continue reading

Biting the hand that feeds IT © 1998–2022