Big data vendors introduce Apache Iceberg features in same week

Market rivals settle on open table format, while Microsoft and Databricks go their own way

Apache Iceberg has secured renewed momentum in the last week after leading vendors in data warehousing and analytics all announced new features around the open source table format.

AWS, Cloudera, Google, and Snowflake came out in support of Apache Iceberg. Iceberg faces off contenders including Databricks' Delta Lake – also an open source Linux Foundation project – and Apache Hudi. They are all battling to become the standard table format, allowing users to query data with an analytics engine of their choice without moving it.

For example, Google's data warehouse and analytics environment, BigQuery, is previewing BigQuery tables for Apache Iceberg, which it calls a fully managed, Apache Iceberg-compatible storage engine. The Chocolate Factory aims to bring together its data warehouse and data lake technology, BigLake, in a so-called lakehouse architecture.

"BigLake tables are currently read-only; BigQuery customers have to perform data mutations through external query engines and manually orchestrate data management," the vendor explained in a blog post.

"BigQuery tables for Apache Iceberg use the Apache Iceberg format to store data in customer-owned cloud storage buckets while providing a similar customer experience and feature set as BigQuery native tables."

In this way, the new BigQuery tables are also writable from BigQuery through GoogleSQL data manipulation language (DML) and support ingestion from open source engines such as Apache Spark through BigQuery's Write API.

AWS's Redshift is a rival to BigQuery in so-called cloud-native data warehousing. It has introduced secure sharing of data lake tables, which supports open file formats including Parquet, ORC, JSON, and CSV, as well as open table format Apache Iceberg, all stored in Amazon S3.

Cloudera and Snowflake have different histories in the data analytics market. While the former started out building data lakes out of the Apache Hadoop (HDFS) system, Snowflake was seen as a leader in executing the separation of storage and computing in cloud-based data warehouse systems.

In 2022, both companies backed Apache Iceberg to improve interoperability without moving data.

Last week, Cloudera announced integration with Snowflake by extending its Open Data Lakehouse interoperability, which it said would offer joint customers access to Cloudera's Data Lakehouse via its Apache Iceberg REST Catalog.

In a statement, Abhas Ricky, chief strategy officer of Cloudera, said the move would help customers simplify their data architecture, minimize data pipelines, and reduce the total cost of ownership of their data estate while reducing security risks.

icebergs iceland

The force is strong in Iceberg: Are the table format wars entering the final chapter?

READ MORE

Keen observers will note exceptions to the table format love-in, including Microsoft, provider of the second-largest market cloud infrastructure Azure and a slew of data technologies, including its lakehouse environment Fabric. Microsoft went with Delta Lake, owing to market demand, according to Arun Ulag, corporate vice president of Azure Data. Although Microsoft Fabric provides some support for Iceberg and Hudi by default, Fabric favors Delta and Apache Parquet, the column-oriented data file format.

Databricks, meanwhile, dreams of creating a single standard with the best bits of Iceberg and Delta. While that work progresses, it offers hope of integration via its UniForm product, designed to allow data stored in Delta to be read as if it were Apache Iceberg or Apache Hudi.

Earlier this month, Snowflake principal engineer Russell Spitzer said he hoped the de facto standard would be Iceberg. After recently joining from Apple – where Iceberg is said to be wall-to-wall – he said he was seeing a number of developer groups from vendors and tech firms start to contribute to the Iceberg project. ®

More about

TIP US OFF

Send us news


Other stories you might like