This article is more than 1 year old

Teradata takes on cloud-native rivals with data lakes, MLOps

Reworked use of object storage changes footing with competitors but will need to convince devs

Teradata has launched analytics and data lake platforms as it strives to steal the march on so-call cloud-native enterprise data warehouse companies.

With ClearScape Analytics, the data warehousing stalwart has launched 50 new in-database time series and ML functions designed to support end-to-end machine learning pipelines.

The company has also embraced cloud-based data lakes, with a product VantageCloud Lake. Its cloud data warehouse platform, Teradata Vantage, has also been renamed VantageCloud Enterprise.

The last few years have seen a confluence of companies around the lakehouse concept. Despite the dubious moniker, it represents a trend in trying to bring together data warehouse workloads — repeated analytics on structured data — with data lakes, and semi-structured data repositories for more exploratory analyses.

They still have their work cut out for them embracing developers

From the data lake side, Databricks has announced Databricks SQL Serverless, designed to improve query performance and concurrency of BI and analytics workloads on its data lake. Cloudera similarly promises analytics and data exploration in a single platform.

On the data warehouse side, Snowflake has promoted its usefulness as a data lake with support for Python and unstructured data.

Teradata did make an earlier approach to supporting data lakes with the ability to run Hadoop in its on-prem analytics platform Astor.

With VantageCloud Lake, Teradata promises centralized object storage (AWS S3 initially) offering open data formats, structured and unstructured data and flexible schema.

Teradata previously supported S3 and other cloud storage options since the launch of its cloud-based Vantage platform, but Hillary Ashton, chief product officer at Teradata, told The Register its analytics and data management were now more optimized for S3.

"We support read and write and Enterprise Edition with object store, but it was really optimized for EBS block storage for low latency workloads. With [the new data lake] we have optimized for object storage. That seems subtle, but it's actually a very significant difference. That object store is the primary location for data in the lake edition and it's an auxiliary location in enterprise (data warehouse).

"To say it's optimized for object storage now means that we brought the intelligence of our indexing, and our workload management and brought it down into object storage, which differentiates us from just a typical read, write and to move into object store, and really allows us to bring the IP that we've developed over the years in terms of massive parallel processing and improvements in access time into object store," Ashton said.

In its analytics environment, Teradata has introduced more support for management of the machine learning pipeline. So called Model Ops, the system automates the process of picking the most effective champion and challenger model on a given data set.

"Model Ops allows you to manage that process in an automated fashion so that you can constantly be running champion challenger modeling at scale, which means that you're going to get to better analytic outcomes faster," Ashton said.

While this replicates some of the functionality of H2O.ai, Teradata also partners with the ML specialist. "If you've chosen H2O.ai, you can build your models there and then you can import them directly into Teradata Vantage," she said.

Analyst Tony Baer, principal at dbInsight, noted Teradata had developed its own technology instead of adopting open source table formats such as Iceberg (used by Cloudera) or Delta, used by Databricks.

"Given Teradata's longtime positioning for extreme, complex analytics, going to the data lake is a natural move. They are still doing so on their own terms as their data lake table format is not using Iceberg or Delta open source. But never say never there," he said.

Teradata's cloud strategy is an effort to grab some market and customer attention from so-called cloud-native data warehouse systems such as Snowflake, AWS Redshift, Google's BigQuery and Microsoft Azure's Synapse. But it might struggle convincing younger developers to use it given its long history of on-prem systems, Baer said.

"The cloud gives Teradata a chance to expand their footprints with existing customers to take on more discretionary workloads, but they still have their work cut out for them embracing developers, most of whom probably don't know Teradata or view it as 'their father's platform'," he said. ®

More about

TIP US OFF

Send us news


Other stories you might like