Informatica has launched a SaaS product that aims to manage data governance and catalogues in a single system.
While cloud platform and data lake vendors are encouraging organisations to use their tools to list and find data sources, as well as understand data provenance and quality, the 28-year-old data integration outfit argues users need a neutral provider to oversee these tasks.
To this end, Informatica is releasing its Cloud Data Governance and Catalog, which it claims is an enterprise-scale, seamless data governance and catalog as-a-service, a component of its Intelligent Data Management Cloud (IDMC), launched in April.
The point is to oversee both traditional data warehousing and analytics, and the governance of machine learning in a single tool, according to the Informatica.
Speaking to The Register, David Corrigan, Informatica general manager of data governance and privacy, said: "A lot of manual work goes into just explaining machine learning models, while the other side of the coin is the data. In order to explain how the model was trained, one needs to understand what training sets and what data were used for those models. What was the quality of that data, the biases that were inherent in the data? The data that goes into the analytic model is crucial in explainability for both how it was trained and the resulting outputs."
He said the idea of the SaaS product was to understand help governance professionals, as well as analytics and AI professionals "search for and find analytic models and associated datasets, and to understand the relationship between them."
- Snowflake agrees it's good to share... on its platform, while Databricks opts for a more vendor-neutral approach
- Keeping track of one cloud provider's data products is a 'full-time job' so forget mixing and matching, says Gartner
- Informatica's Intelligent Data Management Cloud not new tech, but covers hyperscalers' weakness in data integration
- Informatica hopes to unclog your data pipelines with help from Nvidia in accelerating Spark-based ML operations
But Informatica is not the only tool in town claiming to crack this particular nut. AWS has Glue while Google Cloud Platform and Microsoft Azure also have their own data catalogue products. And Databricks, the data lake and machine learning platform, launched something called Unity Catalog in June. It uses the industry-standard ANSI SQL and is designed to offer one interface to access both structured and unstructured data, across all cloud data lakes, to help users get a single view of their data on the Databricks Lakehouse Platform.
The problem is, not all data, analytics or AI/ML is in a single platform. SAP, Oracle, and Salesforce all have analytics and AI/ML built into their applications platforms, for example.
Here, Corrigan said, Informatica offers the ability to "scan and profile metadata from many different enterprise systems: cloud environments as well as application vendors and on-prem ones as well. We bring in that metadata, and with our own [ML tools] to classify it, catalogue it, and associated business glossary of terms with it."
He added: "It's not good enough to optimise just marketing if you don't understand the experience that a customer has once they buy your product, for example. Enterprise cataloguing and governance and needs to become a sort of a neutral Switzerland of data and an equal citizen to all of these other clouds and applications to get a true understanding of data."
Informatica is not the only firm claiming to sell this cross-enterprise catalogue and control either. Talend and Dataiku are among those promoting products with similar ambitions.
But Informatica's investment in re-engineering its suite for the cloud – instead of just taking a lift-and-shift approach – makes some of its claims credible, said Martin Kuppinger, founder and principal analyst with independent research firm KuppingerCole.
"With Informatica's approach and investment to come up with a modern solution, they do it well," he said. "If you want you go strategic and get a grip on your data, then having solutions, such as Informatica, that are able to use all the data, regardless of where resides, is a must. So I think what they do is actually meaningful." ®