Cloudera execs grab the mic at ex-Hortonworks gig, dish details on new data platform
Go off-cluster if you wanna, plus 'batteries-included' Kubernetes containers
Cloudera, fresh from the uneven merger with former Hadoop distro competitor Hortonworks, used its first major public event to thrust a new data platform hard at the enterprise.
Execs at the not-so-new-look firm (a revamped logo and branding will be revealed soon) are this week taking to the stage at the DataWorks Summit, which was Hortonworks' annual shindig before the corporate mash-up.
Cisco and AWS hop into bed for steamy hybrid Kubernetes actionREAD MORE
At a press and analyst event yesterday ahead of the main conference, top brass fleshed out the firm's new flagship product, the Cloudera Data Platform (CDP), which was trailed as the Hadoop-flinger recently reported what were widely seen as lacklustre financials.
CMO Mick Hollinson in typically understated fashion said Cloudera's "enterprise data cloud" was solely focused on "solving the big data problem for the largest companies on the planet".
Hollinson outlined four elements: multi-function analytics anywhere; the ability to support "every conceivable cloud mechanism"; common support and governance; and that it is still an open platform. Having been historically open-core, he said, Cloudera's distro will now be 100 per cent open source – as was Hortonworks.
The first incarnation of CDP will be delivered in summer. This will support two public clouds – Azure and AWS – and support both data engineering and data warehousing. Later in the year, or possibly early 2020, there will be a second release supporting private cloud containerization and further analytics functions.
As we pointed out last week, CEO Tom Reilly has said Cloudera can use CDP to compete against AWS because it can sell multi-vendor clouds and hybrid clouds too.
What's in the box?
The technical spec of the platform was added by Fred Koopmans, veep for product management, who touted the service as an answer to all its customers' prayers.
For instance, a "huge driver" for many of its customers is to ensure products are open source and "provide no dead ends", while a hybrid, multi-cloud platform allows them to be prepared for rapid and dramatic changes in infrastructure.
Koopmans noted that both companies last summer introduced major new versions of their platforms for the first time in about five years – but that the "vast majority" of customers were still running previous versions. They will be given a direct upgrade path to CDP, which he said was a common question for customers.
However, CDP, he said, was a much bigger leap – "a new kind of platform."
So despite all the cash ploughed into big data, no one knows how to make it profitableREAD MORE
It will include a shiny interaction model that means not everyone in a firm has to share the same base clusters or upgrade cycles. Koopmans said this was in response to a common question of how firms can speed up internal access to data, and how they can "be more agile".
CDP, he claimed, addresses this by allowing biz users to deploy new applications off-cluster. And options for the application experience are a flexible approach called Distro-X or a self-service experience that is supposed to be more simplified and constrained. The latter option trades flexibility in order to get a lot more automation and self-service.
A biz, for example, can build a self-serve data-mart that can be shared with a particular team for a few weeks, and then "throw the whole environment away". This, Koopmans said, was perfect for people who don't want to invest a lot of time and scripting for something "ephemeral".
CDP also brings with it a new computing model – rather than deploying on bare metal, or if in the cloud running on IaaS from Amazon, Google or Microsoft, Koopmans said it will now run on a container platform.
"First off it's virtualized by default, rather than as an afterthought; second it's elastic, so you can grow and shrink these resources much more simply, much more efficiently," he said. This means storage and compute don't have to scale at the same rate.
In the data centre, there will be two deployment models: first, the customer provides one. Second, Cloudera offers a "batteries-included" version. "Most customers don't yet have a general-purpose Kubernetes environment we can run a container on, or if they do it's not really optimised for big data applications," he said.
There will also be a new management framework that Koopmans said would enable much greater scale for applications. With potentially thousands of applications sharing a data lake and computing environment, the exec said it was crucial for customers to have unified control of that, along with automated management of their life cycle. There will also be unified metadata management for common security and governance.
Hollinson previously claimed this common security and governance model helped Cloudera's platform stand out from those from the other vendors, as it didn't create a "competitive moat".
"There are many companies that may offer one workload, they offer that with their own set of security and governance models. Then if you buy another workload from another company, you need another [model]," he said. "This is true even inside the large public cloud vendors."
Other elements are new form factors for simplified operation in the cloud, new portability and integration tools, and a new development model that Koopmans claimed will allow faster execution, with updates pushed out twice a month.
Cloudera shakes off Hortonworks fixation, realises AWS was the big baddie all alongREAD MORE
There will be expanded data warehouse tools, taking the best elements of each of the distros' toolsets – which had diverged – with an eventual aim being to automate the choice for customers so the best one is automatically selected for use case.
Koopmans also pointed to new capabilities available for existing customers now, "without any major surgery". This includes, for CDH, a remote cluster management service, which was an operational element Hortonworks had. HDP customers will get Cloudera's integrated machine learning model development platform, which aims to help data scientists to be more productive.
Hollinson also said that the firm was trying to "teach customers how to fish" – by which he meant that Cloudera would sell them professional services and training – adding that the two Hadoop-flingers would continue their respective business relationships with other firms. ®