AWS and IBM Netezza come out in support of Iceberg in table format face-off
Join Snowflake, Cloudera, Google as Apache format fans
Cloud giant AWS has picked table format Apache Iceberg to extend the reach of its Redshift data warehouse to data lakes, in a move replicated by IBM's Netezza last week.
AWS revealed that it was previewing support for Iceberg, which emerged from Netflix in the late 2010s, to allow users to employ Redshift to run analytics queries on Apache Iceberg tables in external data lakes.
"You can now use Amazon Redshift to query your Apache Iceberg tables in AWS Glue Data Catalog while other users or applications can safely conduct data manipulation on your tables using ACID-compliant services like Amazon EMR, Amazon Athena, and AWS Glue," it said.
The fine print introduced some caveats, though. "New Iceberg tables only – Queries on partitioned tables which were converted from Apache Parquet tables to Apache Iceberg tables and include partition columns in the query are not supported," it said in an accompanying user guide.
AWS later clarified how the system could be used to query data outside its cloud platform.
"Amazon Redshift provides transactional consistency for querying Apache Iceberg tables from data lakes in AWS (including Amazon S3). To run analytics on external data sources (including Google BigQuery or Google Cloud Storage), AWS customers can use Amazon Athena's prebuilt data source connectors," the company told The Register.
It said that pricing would be based on Redshift Spectrum or Redshift Serverless usage.
Another fillip for Iceberg comes from IBM's Netezza, that almost forgotten data warehouse originally based on PostgreSQL. We last heard from Netezza when IBM, which bought it in 2010, finally moved the system to the cloud.
IBM software engineer Mike DeRoy blogged this week that users can employ IBM's lakehouse technology watsonx.data to create tables in the Apache Iceberg table format, "allowing any compatible engine to access the data and preventing you from being locked in to any specific engine."
"IBM is bringing first class lakehouse integration into the Netezza engine, allowing you to query Iceberg tables from both the watsonx.data platform, as well as other datalake platforms," he said.
Who's sitting at which table?
Although hardly the Betamax vs VHS standards face-off, the big-hitting vendors seem to be divided in which table format they are backing in bringing the vision of analytics engines to the data, wherever the data is. Snowflake, Cloudera, Google and now AWS and Netezza have gone with Iceberg. But Microsoft, SAP and Databricks have picked the table format the latter created, with the open source project managed by the Linux Foundation.
- MySQL Heatwave dives into object storage data lakes
- Databricks puts cards on the table format as Snowflake looks for more players
- Native Americans urge Apache Software Foundation to ditch name
- Apache Iceberg promises to change the economics of cloud-based data analytics
Each vendor has justified its approach by saying their chosen format reflects what customers are demanding most. They have also said they would support a range of formats, including Apache Hudi, in the fullness of time.
Which leaves Oracle. Earlier this month Big Red said it was extending its MySQL HeatWave to query data held in object storage. It means its own object storage, of course. Oracle did say, though, that it intends to support open table formats, starting with Iceberg and Delta Lake, in the future. ®