Databricks buys analytics biz, donates MLflow to Linux Foundation, opens up Delta Engine to boffins

Time to trawl through some data lakes


In a busy 24 hours Databricks has handed over MLflow – a machine learning management tool – to the Linux Foundation, bought an analytics biz and moved the status of its Delta Engine to General Availability.

The vendor said MLflow is already open source with 200 contributors and 2 million downloads per month. The Linux Foundation will give MLflow a vendor-neutral home with an open governance model to broaden adoption and contributions to the project, or at least that's the hope.

At its Spark and AI Summit this week, Databricks also confirmed it had bought Redash, which provides dashboards and visualisation tools describing the state of play in data lakes.

If you're a general business user, it's not for you, David Wyatt, the company's senior vice president and general manager for EMEA, told us, adding Databricks developed the tools for data engineers and scientists.

"We still have independent software vendor relationships with the likes of Tableau because that industry is huge. We're not trying to take over that [business user] industry."

Meanwhile, Databricks' Delta Engine has hit GA. The technology, dubbed Delta Lake, builds and executes analytics queries and machine-learning models on the company's data lake technology. The engine is meant to improve performance when analysing a mix of structured and unstructured data.

Customers such as Comcast, Unilever and Starbucks were achieving up to eight times faster execution speeds using the engine, Wyatt claimed.

"Every data workload you've got can now be executed on the Delta Lake, which is a supreme situation, compared to the old traditional approach where you just managed structured data in a structured way and to do any change or do anything agile was very difficult," he said.

Only makes sense with a new rig

Hyoun Park, chief analyst at Amalgam Insights, said if the new Delta Engine improves performance, it may help Databricks be seen "as a single source to be able to place the majority of your important business or organizational data".

pipeline

Hey, Sparky: Confused by data science governance and security in the cloud? Databricks promises to ease machine learning pipelines

READ MORE

However, organisations that have built reliable queries on structured business information stored in enterprise data warehouse technologies from companies such as Teradata, IBM and Oracle would be unlikely to replace them with Databricks' approach.

"Databricks would be glad to take over Teradata data warehouses and it does happen on occasion, but data warehouses are well designed for predictable relational data," Park said. "I don't think there's any reason to tear apart these investments that companies might have spent millions of dollars on.

"However, Databricks is good for being able to throw a lot of different types of data into, going forward."

DataBricks was one of the main vendors behind Spark, a data framework designed to help build queries for distributed file systems such as Hadoop. Matei Zaharia, CTO and co-founder, was the initial author for Spark, which was considered a leap forward in speed and usability compared with Hadoop's query engine MapReduce.

But with Hadoop's heyday drifting into distant memory and a broader set of products out there, the company is competing in a wider world of enterprise data management and analytics. ®


Tech Resources

The State of Application Security 2020

Forrester analyzed the state of application security in 2020 and found over 75% of external attacks are attributed to web application and software exploits.

How backup modernization changes the ransomware game

If the thrill of backing up your data and wondering if you will ever see it again has worn off, start the new year by getting rid of the lingering pain of legacy backup. Bipul Sinha, CEO of the Cloud Data Management Company, Rubrik, and Miguel Zatarain, Director of Global Infrastructure Technology at PACCAR, Fortune 500 manufacturer of trucks and Rubrik customer, are talking to the Reg’s Tim Phillips about how to eliminate the costly, slow and spotty performance of legacy backup, and how to modernize your implementation in 2021 to make your business more resilient.

Webcast Slide Deck | Three reasons you need a hybrid multicloud

Businesses need their IT teams to operate applications and data in a hybrid environment spanning on-premises private and public clouds. But this poses many challenges, such as managing complex networking, re-architecting applications for the cloud, and managing multiple infrastructure silos. There is a pressing need for a single platform that addresses these challenges - a hybrid multicloud built for the digital innovation era. Just this Regcast to find out: Why hybrid multicloud is the ideal path to accelerate cloud migration.

Anatomy of a Private Cloud

Learn the key elements that combined, build a true Private Cloud

Biting the hand that feeds IT © 1998–2021