Microsoft has teased Azure Purview, a tech built to deal with customers' data compliance and governance requirements by verifiyng the presence of sensitive information without users having to eyeball the stuff first.
The Purview Preview (not to be uttered after a few shots of the Azure AI-infused whisky) has arisen from in-house technology developed by Microsoft as part of its own internal privacy and compliance efforts.
Designed to manage the company's own epic amounts of data, the platform has been released alongside a new, generally available version of data warehouse wrangler Azure Synapse Analytics.
'We've heard the feedback...' Microsoft 365 axes per-user productivity monitoring after privacy backlashREAD MORE
Purview is the latest in a line of tools from Microsoft aimed at helping customers to get on top of the oceans of data sloshing around in their organisations. In this case, that assistance takes the form of an Azure service going sniffing for sensitive information.
The cloud service builds on existing search capabilities and uses connectors to hook up various sources, from Microsoft's own Azure SQL, Cosmos DB or Azure Blobs, to the external, such as SAP, Teradata or Amazon AWS S3.
A scheduled or one-off serverless scan slurps the metadata from these sources, exposing a data map as Apache Atlas open APIs (and thus enabling the programmatic pushing of more data to expand things further.)
During the scan, classifications can be applied based on column names or "the system can still scan the content of the columns to verify the presence of sensitive information," explained Engineering Lead Mike Flasko.
The latter point might raise an eyebrow or two, although Flasko emphasised that the data itself is not actually moved.
As for how that checking works, rules can be configured and the sensitivity labels defined in the Microsoft 365 Compliance Center used. Purview is also integrated with Microsoft Information Protection and, unsurprisingly, lurks in the Microsoft 365 E5 compliance plan.
The timing, hot on the heels of the privacy furore surrounding the company's Productivity Score, is interesting. With Purview, Microsoft is keen to emphasise the increasing focus on data governance within enterprises and leaned on the GDPR (General Data Protection Regulation) to make its point.
"It's a very welcome step," Prof Alan Woodward from Centre for Cyber Security, University of Surrey, told The Register. "In even a modest organisation the data assets can be spread across so many different systems that it can be very difficult to ensure that you even know where it all [is] never mind ensuring that it's protected appropriately. That’s the ambition of initiatives like Purview."
Noting that the distribution of data over many environments and physical locations was complicating things and making tools like Purview and its ilk "almost essential," Woodward added that the discovery part of the process could be tricky, and warned that the simple act of searching for terms relating to personal data might result in some accidental non-compliance.
"That's why tools like this have to be built to take that into account," he said.
"What is good about these tools," he went on, "is that it means you don't have a person doing it - if you effectively automate the discovery process, even if you find sensitive data held somewhere it shouldn't be, the technology can flag it without necessarily having to reveal to full content."
Information privacy and security expert John Wunderlich sounded a note of caution and told El Reg: "Much will depend on context and it's very hard to find and categorize sensitive data as text vs as data. Machines don't parse word documents or emails for sensitive texts all that well."
"One big risk will be false claims," he added. "Doing a comprehensive data inventory of an enterprise is expensive and time consuming. So the temptation will be to do the 'easy' stuff and call it done."
The Purview team plans to continue adding additional data sources, and Flasko observed "understanding your data is one of the most important steps in efficient data governance."
Indeed it is. ®