Databricks buys analytics biz, donates MLflow to Linux Foundation, opens up Delta Engine to boffins

Time to trawl through some data lakes

Reg comments Got Tips?

In a busy 24 hours Databricks has handed over MLflow – a machine learning management tool – to the Linux Foundation, bought an analytics biz and moved the status of its Delta Engine to General Availability.

The vendor said MLflow is already open source with 200 contributors and 2 million downloads per month. The Linux Foundation will give MLflow a vendor-neutral home with an open governance model to broaden adoption and contributions to the project, or at least that's the hope.

At its Spark and AI Summit this week, Databricks also confirmed it had bought Redash, which provides dashboards and visualisation tools describing the state of play in data lakes.

If you're a general business user, it's not for you, David Wyatt, the company's senior vice president and general manager for EMEA, told us, adding Databricks developed the tools for data engineers and scientists.

"We still have independent software vendor relationships with the likes of Tableau because that industry is huge. We're not trying to take over that [business user] industry."

Meanwhile, Databricks' Delta Engine has hit GA. The technology, dubbed Delta Lake, builds and executes analytics queries and machine-learning models on the company's data lake technology. The engine is meant to improve performance when analysing a mix of structured and unstructured data.

Customers such as Comcast, Unilever and Starbucks were achieving up to eight times faster execution speeds using the engine, Wyatt claimed.

"Every data workload you've got can now be executed on the Delta Lake, which is a supreme situation, compared to the old traditional approach where you just managed structured data in a structured way and to do any change or do anything agile was very difficult," he said.

Only makes sense with a new rig

Hyoun Park, chief analyst at Amalgam Insights, said if the new Delta Engine improves performance, it may help Databricks be seen "as a single source to be able to place the majority of your important business or organizational data".

pipeline

Hey, Sparky: Confused by data science governance and security in the cloud? Databricks promises to ease machine learning pipelines

READ MORE

However, organisations that have built reliable queries on structured business information stored in enterprise data warehouse technologies from companies such as Teradata, IBM and Oracle would be unlikely to replace them with Databricks' approach.

"Databricks would be glad to take over Teradata data warehouses and it does happen on occasion, but data warehouses are well designed for predictable relational data," Park said. "I don't think there's any reason to tear apart these investments that companies might have spent millions of dollars on.

"However, Databricks is good for being able to throw a lot of different types of data into, going forward."

DataBricks was one of the main vendors behind Spark, a data framework designed to help build queries for distributed file systems such as Hadoop. Matei Zaharia, CTO and co-founder, was the initial author for Spark, which was considered a leap forward in speed and usability compared with Hadoop's query engine MapReduce.

But with Hadoop's heyday drifting into distant memory and a broader set of products out there, the company is competing in a wider world of enterprise data management and analytics. ®


Biting the hand that feeds IT © 1998–2020