SharePoint Syntex: Microsoft rolls out AI that automatically categorises documents

Starts trying to make fetch happen... build your own models to parse corporate content

6 Reg comments Got Tips?

Ignite Microsoft's SharePoint Syntex is a new feature of SharePoint online that promises to extract metadata from documents automatically, making it easier to find and categorise information.

SharePoint Syntex, currently in preview but with general availability promised for 1 October, is the first product based on a wider technology unveiled at the 2019 Ignite event called Project Cortex.

The core idea is to use AI to parse content stored in Microsoft's cloud, drawing not only on the words, images, and links in the documents, but also on other signals in the Microsoft Graph, such as who is engaging with the content and what departments they are in.

Syntex can drive document workflows such as approval after categorising documents though AI-powered analysis, presented in the Content Center

Syntex can drive document workflows such as approval after categorising documents though AI-powered analysis, presented in the Content Center

Microsoft said that after seeing how Project Cortex was used with preview customers, it has decided to have multiple projects based on the technology, rather than one. SharePoint Syntex is the first, a premium add-on for SharePoint online which is focused on using AI to automate content understanding and automation, such as routing a document to the right person for approval.

This is not the first time we have seen AI applied to SharePoint content. Microsoft introduced Office Delve in 2014, also based on the Office Graph, the theory being that it automatically shows users the documents that are most relevant to them. Delve has had little impact – will Syntex be different?

It is early days, but Syntex is more ambitious than Delve. Delve was focused on surfacing relevant content for a user, whereas Syntex can add metadata to documents that in theory could save substantial manual effort. Syntex could parse a purchase order, for example, work out the monetary value, the customer, and the region where the customer is based, and another process could forward it to the appropriate team to progress the order.

According to general manager Seth Patton, Syntex processes three different types of content: images, forms, and unstructured documents. It will tag images with "thousands of commonly recognized objects", make tags by recognising handwritten text, and read the fields in forms including parsing of dates, numbers, names, and addresses.

Syntex documents are surfaced in a new Content Center, which sorts documents into libraries and shows the metadata it has extracted as columns. Syntex tagging can also be used for compliance, adding retention or sensitivity labels, and setting things like encryption, sharing restrictions, and conditional access policies.

Creating a custom model in Syntex by training based on files which have labelled content, identifying the metadata

Creating a custom model in Syntex by training based on files which have labelled content, identifying the metadata

The most intriguing part of Syntex is the ability to train new models for extracting metadata from documents. Every business has its own terms and categories. Syntex has a model creation feature where you can define entities, such as "Contractor" or "Fee amount", mark existing documents with labels identifying the values for these entities, and submitting these to train a model that will enable AI to extract them automatically from new documents.

As few as five files to train the model

Naomi Moneypenny, director of program management for Syntex, said at Ignite that as few as five files could be sufficient for training, particularly if users supply both positive and negative examples of a particular content type. Form processing, which should be the easiest type of content from which to extract metadata, has a specific form processing engine.

Content processed by Syntex does not have to live in SharePoint, but can also be sucked in from other sources via Microsoft Graph content connectors. Examples of such sources include file shares, Azure SQL, Box, Amazon S3, Google Drive, SharePoint on-premises, and Salesforce.

Microsoft spoke at Ignite about new features planned for Syntex early next years, which include expanded model types, central model management, Syntex-based solutions for business processed, and more integration between Syntex and "knowledge improvements across Microsoft 365".

All a bit vague, but you get the impression that the company sees AI-driven content analysis as a significant piece in its 365 offering.

Whereas Delve was free for licensed SharePoint users, Syntex is a paid-for service available to E3 or E5 subscribers to Microsoft 365. The pricing looks complex, being per-user and limited to "500 items indexed by content connector, pooled", according to a slide presented at Ignite. Customers also get credits for form processing. Presumably additional fees apply if these limits are exceeded.

The problem with all the above is whether the company is over-promising when it comes to the benefits of Syntex. Considering the complexity of the underlying data science, the company's ability to simplify the usage of AI services, whether in Syntex or its other Cognitive Services portfolio, is not in doubt.

AI is an inherently imperfect technology, though, which is worrying in a business context if organisations depend on it too much, for example, to decide whether or not a document is confidential. As a paid-for service, Syntex will have to deliver high enough accuracy to justify its cost.

Whether or not Syntex flies, you can bet Microsoft, like others in the document management area, will continue to apply AI technology in the hope of making better sense of these repositories of unstructured data. ®

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER


Biting the hand that feeds IT © 1998–2020