Lakehouse dam breaks after departure of long-time Teradata CTO
Data warehousing giant abandons stance against hybrid analytics
Updated Data warehouse stalwart Teradata has shook off its aversion to the lakehouse concept, embracing the idea of performing enterprise analytics on unstructured data – a situation it once argued against.
Founded in 1979, the company pioneered enterprise data warehousing in the decades through to the 2010s, but has since been overshadowed by so-called cloud-native data warehouse products, which promise greater flexibility and lower startup costs.
Teradata has now announced support for open table formats (OTFs) Apache Iceberg and Linux Foundation Delta Lake, embracing an industry trend towards performing analytics on data in-situ, rather than moving it into a single store for BI and other analysis.
Teradata claimed that AI adoption had consolidated data warehouses, analytics, and data science workloads into unified lakehouses. "OTF support further enhances Teradata's lakehouse capabilities, providing a storage abstraction layer that's designed to be flexible, cost-efficient, and easy-to-use," it said in a corporate missive.
The lakehouse concept originates with Teradata rival Databricks, a machine learning and analytics company with a history based around Apache Spark. Databricks launched the concept back in 2020 as a sort of hybrid approach by bringing better governance to the data lakes where organizations store messy data and allowing SQL-based analytics in-situ.
Until 18 months ago, Teradata eschewed the lakehouse concept. Speaking to The Register in late 2022, former CTO Stephen Brobst said that a data lake and data warehouse should be discrete concepts within a coherent data architecture, playing to the vendor's historic strengths in query optimization and thousand-user concurrency.
"You need to have a unified architecture, but they are discrete things. There is a difference between the raw data, which is really data lake, and the data product, which is the enterprise data warehouse," Brobst said.
Although Teradata launched its own data lake in August, in part by improving optimization for object stores such as AWS S3, Brobst said there was an important distinction between raw data and the data warehouse, with the latter optimizing query performance and controls governance.
Teradata's decision to execute a dramatic volte-face is perhaps related in some way to the departure of Brobst, who left the company he helped develop in January after more than 24 years.
Teradata claims its adoption of OTFs Delta Lake and Iceberg brings a "forward-looking dimension to Teradata VantageCloud Lake," which is now available and offers "cloud-native analytics and data platform for AI".
- Snowflake share price falls after revenue forecasts dip below expectations
- Microsoft, Databricks double act tries to sew up the data platform market
- Teradata introduces LLMs to predictive analytics
- Microsoft Fabric promises to tear into the enterprise analytics patchwork
Never mind the fact that rival vendors have already made their position around Delta Lake, Iceberg, and Hudi – another OTF – clear, in some cases nearly two years ago.
Apache Iceberg is an OTF designed for large-scale analytical workloads while supporting query engines including Spark, Trino, Flink, Presto, Hive, and Impala. It has spent the last couple of years gathering momentum after Snowflake, Google, and Cloudera announced their support in 2022. More specialist players are also in on the act, including Dremio, Starburst, and Tabular, which was founded by the team behind the Iceberg project when it was developed at Netflix.
Teradata CTO Stephen Brobst drowns data lakehouse concept
READ MOREDatabricks is behind the Delta Table format, but says it is fully open source as it is managed by the Linux Foundation. Last year, SAP and Microsoft announced support for Delta, but both said they could address data in Iceberg and Hudi in time.
Last week, CRM company Salesforce reinforced its commitment to Apache Iceberg. In a statement to The Register, it said it was contributing to the open source project and worked with data warehouse and data lake partners Snowflake, Google BigQuery, AWS Redshift, Databricks, and Microsoft (Fabric). It would not confirm its approach to Delta Lake.
Across the OTFs, the goal is roughly the same: to bring the analytics engine of choice to the data, without going through the cost and effort of moving the data. Teradata's story has always focused on bringing data into one place, and giving it structure, emphasizing optimized queries and high-performance concurrency. What that means in the light of its newfound support for OTFs and the data lakehouse leaves a lot of unanswered questions. It has been offered the opportunity to respond. ®
Updated to add:
A Teradata spokesperson said: "Teradata is committed to being open and connected, and embracing AI and analytics in the cloud, built around data warehouses, data lakes, data lakehouses, and data in object stores.
"In 2022, we launched a new fully cloud-native data and analytics platform—VantageCloud Lake with ClearScape Analytics—complete with lakehouse capabilities. As part of our leadership in modern data architectures, our open table format (OTF) announcement this week publicly launches support for both Iceberg and Delta, with the performance and cost governance that enterprises expect.
"It's still early days for mainstream adoption of OTFs, but Teradata is committed to enhancing and providing these capabilities to our customers. In addition, we are going further than many who came before us by integrating OTFs into our core architecture, driving business value via greater performance and efficiency. As the only company offering cross-read and cross-write across open catalogs, we are demonstrating both our innovation and our steadfast commitment. This is in line with Teradata's long-time focus on helping customers achieve demonstrable business value from integrated data, and our evolution to a Trusted AI platform, including LLMs and gen AI."