Leave your data where it is, we'll look at it there, says SAP
German software giant opens analytics up to external sources
SAP has opened up its analytics system to data outside the enterprise software vendor's environment, and penned partnerships with companies claiming to lead the charge in modern-day data management.
The move should be welcomed by SAP customers hoping to justify their continued investment in the SAP Business Warehouse world. Called SAP Datasphere, the new product is actually an upgrade to the SAP Data Warehouse Cloud.
SAP claims it will allow data professionals to access mission-critical biz data whether they are in the SAP data warehouse, SAP applications, or applications and data stores from other vendors.
The new product will offer a "unified experience" for data integration, data cataloging, semantic modeling, data warehousing, data federation, and data virtualization, SAP told us.
The 50-year-old German software giant built its reputation on Enterprise Resource Planning, a single source of truth for operational data. But with SAP Datasphere, built on SAP Business Technology Platform and powered by the database SAP HANA Cloud, the idea is to allow data analysts working in the SAP analytics environment to work on data wherever it resides, even outside SAP.
CTO Juergen Mueller said SAP wants to help customers "easily and confidently integrate SAP data with non-SAP data from third-party applications and platforms, unlocking entirely new insights and knowledge to bring digital transformation to another level."
If all this sounds familiar, it is because last year saw a trend among data platform vendors opening up to data outside their environments. Snowflake, Google Data Cloud, and Cloudera all announced support for the open table format Apache Iceberg to help achieve this goal. Meanwhile, Tabular, a company founded by Iceberg creators, has promised a "headless" data warehouse for a similar purpose.
As part of its Datasphere news, SAP confirmed a series of agreements, including data governance vendor Collibra, streaming data platform Confluent, machine learning platform DataRobot, and Databricks, the datalake/lakehouse company initially based around Apache Spark.
The latter liaison could help inform SAP's approach to integrating data outside the SAP environment into the SAP analytics platform without moving the data from the source system.
Databricks has its own table format, Delta, which is also open source, although it also works with both Iceberg and Hudi, another open source table format.
Irfan Khan, SAP HANA president and chief product officer, said source systems would have to expose some of the underlying data, within relational models or object-relational models.
He said users would have to prepare or access a pre-existing data model built on one of the open table formats.
- Power behind throne to depart as SAP enters a new era
- SAP culls 3,000 jobs – and its results weren't even that bad
- University still living in the Nineties seeks help with move to SAP S/4HANA
- Don't expect another SAP support deadline extension, user group warns
"For example, Delta Lake is an OTF that is supported by Databricks, but also Iceberg and Hudi are out there as well. Our fundamental approach really is the concept of a data product in a source system: that could be exposing existing data products, or you could construct data products. You are essentially modelling the source systems," Khan said.
Datasphere would then be able to "crawl through" these different endpoints, he said. "We're going to search out these distinct data products, build it into the data catalogue, and that catalogue could be seamlessly integrated within Datasphere itself, or through the partnerships like the one with Collibra."
Hyoun Park, CEO and chief analyst with Amalgam Insights, said it was a "big deal" that SAP could access external data more easily without having to "rigidly define how it fits in the data warehouse."
With the Databricks and DataRobot partnerships, SAP has a much better alignment to use data for machine learning model use cases, he said. "It's a good announcement and one I'd be excited about as an SAP user trying to justify ongoing investment in SAP."
Park said there were two big challenges with SAP Data Warehouse Cloud as a core enterprise data environment. "First, as a silo, it has been a destination that third-party data needs to be replicated into. Second, as a technology focused on structured data, it has sometimes lacked certain capabilities to support the variety of data needed to support certain real-time and machine learning use cases without additional augmentation."
But the Datasphere approach made it more like a data fabric offering access to data for machine learning and analytics wherever it resides.
"That potential of aligning the already-massive stores of SAP data with the ongoing transformation of the external data world outside of SAP will allow SAP users to take fuller advantage of their existing data for an AI-powered future, where they will be asked to align existing technologies to a near-term future of services such as ChatGPT as well as face the need to build their own custom AI models and overlays. Augmenting the SAP Data Warehouse to support business context and machine learning would be a step towards maintaining the value of this foundational store of data." ®