Databricks' $1B Tabular buy raises questions around table format wars
Keeping things neutral will be harder, say critics, as tiny startup acquired
In September last year it was a company with around 25 employees, all working remotely. Last night, Tabular was bought by analytics and machine learning platform Databricks for a reported $1 billion.
The reason Databricks — itself still VC funded — was prepared to splash so much cash on the fledgling vendor is that its founders were behind the popular data table format Apache Iceberg. They have produced software to accompany it, including a so-called headless data warehouse to create a neutral layer between Iceberg data — which might reside across the range of systems — and analytics engines.
In January last year, The Register predicted that Iceberg would change the economics of cloud-based data analytics and Databricks has offered a $1 billion vote of confidence in that thesis.
Databricks — nominally worth $43 billion after $4 billion in VC funding — has its own open table format in Delta Lake, which is open source and is run by the Linux Foundation. Critics have argued, though, that a single vendor — Databricks — is always likely to exert more influence on the format. Iceberg, meanwhile, was born out of a Netflix project and is perceived as being more vendor-neutral.
In a prepared statement, Ali Ghodsi, Databricks CEO and co-founder, said the lakehouse paradigm — which brings messy data lakes with regimented data warehouses — had been split between the two most popular formats: Delta Lake and Iceberg.
"Databricks and Tabular will work with the open source community to bring the two formats closer to each other over time, increasing openness, and reducing silos and friction for customers," he said.
He pointed out that last year, Databricks announced Delta Lake UniForm to bring interoperability to these two formats.
Tabular, on the other hand, provides the headless data warehouse, which is a managed catalog integrated with role-based access controls and automated services. It aims to be a neutral layer between data and analytics engines. In September, Tabular closed a $26 million Series B funding round led by Altimeter Capital, with participation from Andreessen Horowitz and Zetta Venture Partners.
The Register first noticed the potential dichotomy between Iceberg and Delta in the summer of 2022, when cloud-based data warehouse vendor Snowflake and former Hadoop data lake specialist Cloudera both backed the Apache project.
Ryan Blue, Tabular co-founder and CEO, helped develop Iceberg at Netflix to help solve performance and usability challenges inherent in Apache Hive tables in large and demanding data lake environments.
Speaking to The Register in 2023, Blue said he welcomed the Databricks announcement to support Iceberg "as a way of getting access to data in their platform" as Iceberg would become the "ubiquitous choice."
He said there were "concerns about Delta Lake in terms of the neutrality of the format, and the ability for other players to really invest and get the most out of it, because it is so tightly controlled by Databricks." He said Tabular wanted to "stay a neutral layer."
"We actually worked great in Databricks, we want to work with them as a compute partner. They've done a good job building Delta, and that might be a good choice for some people. The people that want this modularity or choice in the [analytics] engine side; it's probably just not a good choice for them," he said in September 2023.
In its announcement about the takeover, Databricks is vague on which format would survive. It said it intended to work closely with the Delta Lake and Iceberg communities to "bring format compatibility to the lakehouse."
In the short term, that meant inside Delta Lake UniForm but in the long term, it would mean "evolving toward a single, open, and common standard of interoperability."
Databricks has been asked to provide more detail.
TechMarketView analyst Craig Wentworth said: "Fewer competing data formats in the Lakehouse market (and/or a platform that supports multiple formats well) also increases Databricks' attractiveness to potential new customers who don’t want to be distracted by underlying table format wars."
- Lakehouse dam breaks after departure of long-time Teradata CTO
- Google flaunts concurrency, optimization as cloud rivals overhaul platforms
- Microsoft touts mirroring over moving in data warehouse gambit
- AWS and IBM Netezza come out in support of Iceberg in table format face-off
Whether it provides the neutral layer Blue and Tabular once imagined is another matter. Databricks also provides analytics engines and has a financial interest in people using them.
In the vendor market, Databricks may also make a play against Snowflake, which this week unveiled the Polaris Catalog, "an open source catalog for Apache Iceberg that enhances data interoperability across various engines and cloud services." It might wonder what the future holds for Iceberg now that a rival has bought the company founded by its original developers. ®