Teradata CTO Stephen Brobst drowns data lakehouse concept
Colorful data techie seeks to fend off rivals
Interview Despite industry efforts to get both data exploration and business analysis workloads onto a single "lakehouse" system, separate data lakes and warehouses are still required for effective enterprise analytics and BI systems, Teradata's CTO tells The Register.
Speaking at the vendor's London conference last week, Stephen Brobst sought to set his vision apart from recent trends espoused by rivals that increasingly see data management, analytics, BI, and machine learning bundled together on one platform.
"You need to have a unified architecture, but they are discrete things. There is a difference between the raw data, which is really data lake, and the data product, which is the enterprise data warehouse," Brobst says.
Moves from rival players in the broad enterprise data and analytics markets appear to have set a different path. Born from the world of Hadoop and Apache Spark, Databricks has a long history in data lakes – where businesses dump structured and unstructured data for analytics and exploration – and has more recently added SQL support to its hybrid lakehouse system, where it encourages users to support both data exploration and regular business analytics workload.
Databricks, Snowflake, Cloudera, and Google are also betting on the two distinct workloads in the same environment with their approaches to data warehouses and data lakes respectively.
Although Teradata launched its own data lake in August, in part by improving optimization for object stores such as AWS S3, Brobst said there was an important distinction between where businesses put their raw data and the data warehouse, which optimizes query performance and controls governance.
He explains that although some Teradata customers use Databricks for their data lake, he advises against implementations where they persist the data, add key assignments, and do some light integration.
"This is actually not very useful, because you don't want to have more copies of data than is necessary. If it adds value, OK, fine, but my view is that if you've done the hard work of adding the key structures and the homogenization of the data types, just put it in the … enterprise data warehouse," he says.
Famous for his Hawaiian shirts and dramatic gesticulation during impassioned discussion of data warehousing architecture, Brobst graduated in computer science at UC Berkeley and gained a PhD at MIT. He was part of the leadership team who took Teradata public on NYSE in 1987 and has been one of the driving forces behind the growth of data warehousing in business for four decades.
He says the data lake is a "robust" concept, but distinct from the data warehouse, although they should be interoperable and within the same logical architecture.
"Vendors are all trying to claw it into their direction, but when done right the data lake is a good concept to land the raw data and have a low cost retention of that data in its original form," Brobst says.
"We can afford to keep that raw data in the data lake and retain it and then the data scientists and the power users, they can use that raw data and explore it and decide which data surgically should be promoted in the data product and which one shouldn't, so it's not all or nothing anymore."
After the advent of so-called cloud-native data warehouses, which saw Snowflake's spectacular $33 billion IPO, Teradata was perceived as something of a laggard. In May 2020, disappointing financials prompted a switch of CEO.
- Teradata takes on cloud-native rivals with data lakes, MLOps
- Teradata to take $60m hit for withdrawal from Russia
- Teradata: Public cloud sales soar from low base, majority of business still on-prem
- SAP patent not inventive enough to get legal protection, judge rules
Despite being powered by recent developments in cloud computing, Teradata has learned from the Hadoops and "cloud-native" hype cycles that focus on architecture, rather than specific technologies, is the secret to success.
"Good architecture is not about technology," Brobst says. "When the technology changes and improves, you're still in a good position. Nobody in their right mind today would actually be deploying in Hadoop Distributed File System. But architecturally, I don't actually care if it's HDFS or object store, although I do of course care about implementation cost and total cost of ownership."
But technology developments have caused Teradata to rethink its investments. An example of the volte-face was the 2014 acquisition of Rainstor – maker of a de-duplication engine adapted for Hadoop – which was later retired as Teradata refocused its efforts on cloud and blob storage.
"Rainstor was largely a Hadoop-based infrastructure for on-prem and so on. So that basically retired as we moved everything to cloud and object storage," Brobst says.
He explains that Teradata faced a different set of challenges to the vendors building data warehousing systems in the cloud from scratch as it wanted to keep its existing customers – which include some of the world's largest banks, retailer, and consumer goods firms – by supporting APIs from on-prem systems in the cloud, easing migration. "We took the harder road in the sense that we maintain compatibility on the APIs so our customers do not have to go through a huge rewriting of everything in order to move to the cloud," he says.
Rusty Warner, Forrester principal analyst, said Teradata's main mission was to focus on existing customers.
"It has to convince people to build on the investments they've made and see them as investments as opposed to legacy tech. It wants to help them figure out what that architecture looks like as it moves to the cloud. It's about not losing their base basically as they face these competitors that are dangling shiny objects in front of those customers who have relied on Teradata for many years."
But this raises the question of whether Teradata can grab new customers as more see cloud-based analytics underpinning business activity, especially in web-based commerce.
"Teradata has a lot of the traditional big companies in the world as clients, so there is not much greenfield in that sense," Warner told The Register. "It would be tech startups without tech debt. That's harder for them to compete with some of the newer players in the market. But I think there is a way to help organizations, especially in verticals like financial services or telcos, get a better handle on what the ecosystem should look like."
One data warehousing expert told The Reg the industry likens Brobst to Back to the Future's Doc Brown. In that sense, the challenge is similar. With old customers and new ones at different stages in their use of enterprise data and cloud adoption, he has to appear in more than one timeline and still make sense. ®