Hands on Microsoft made a big deal of Cortana skills at its Build developer conference earlier this year – the business of creating voice interactions with users via the digital assistant built into Windows 10 and also available for iOS and Android.
Why does the company consider this strategic? The answer was given by a presenter at one of the Cortana sessions, quoting ComScore research stating that by 2020, 50 per cent of all internet searches will be voice.
That figure may prove inaccurate, but the argument is only about by how much voice search will grow, and how fast. Mobile devices, home assistants like Amazon Alexa, Google Home and Apple HomePod, hands-free search while driving, voice control in games consoles: all these things point to a boom in voice-powered interaction.
Even so, Microsoft? With Windows Phone abandoned and Cortana in Windows 10 as much annoying as useful, does it have much chance in this race? Cortana is available for Android and iOS, but integration is weaker than with Windows Phone, and competing with Google Now and Siri on their native platforms looks like a lost cause.
It may still pay Microsoft to invest in this area. Business use is one possibility. Although Windows Phone is dead as a mainstream contender, Windows 10 remains mobile-capable, so there will always be mobile devices running Windows 10, ranging from laptops to tablets to phones. Microsoft also has a foothold in the home via Xbox.
There is also a cross-platform aspect to many of Microsoft’s services today. The Bot Framework, used by Cortana skills, also supports third-party platforms including Facebook Messenger, Slack and Twilio. It would not be surprising to see this extended to other digital assistants.
Another possibility is that Cortana one day becomes what Microsoft intends it to be: a single, personalised point of interaction that lets you launch applications, book travel, arrange meetings, make phone calls, send messages, search the web, and get answers to almost any question. If that were the case, using Cortana for things like product support would make perfect sense.
Microsoft claims “145 million monthly active users” for Cortana.
The company’s overall goal though is not so much promoting Cortana as to push its Cognitive Services, a suite of intelligence APIs covering vision, speech, language understanding, search, knowledge and more. The current count is 36 services, of which six are early access technologies and 17 are in preview. Cortana skills are also in preview, so developers experimenting in this area can expect rough edges.
How rough? Let's find out.
How Cortana skills work
A Cortana skill is essentially a channel of artificial intelligence accessed through Cortana. There are built-in skills like Search, setting reminders, or launching applications; and there are add-in skills built by third parties. Add-in skills are accessed via invocations, key phrases which are registered with Microsoft, so that when the user says, for example, “Ask <invocation Name> <something>”, the question is passed to the registered service rather than being handled by built-in skills. There are currently 17 words you can use before the invocation name, though these are language specific. Unfortunately the only language available in the preview is US English, though if you are in the UK or elsewhere, you can easily set Cortana to US English in order to test a Cortana skill.
Once your skill has been invoked, Cortana is temporarily at your disposal. You can converse with the user, for example. You can also trigger actions, though these are limited to URIs (Uniform Resource Identifiers), allowing things like navigating to a web site, starting an email, or launching a registered application.
If you need authentication, there are a few options. You can register your skill to require the user’s Microsoft account, or set up a connected account using an oAuth2 provider (an open standard used by Microsoft, Google, Facebook and Twitter among others), or have Cortana pop up a sign-in dialog. This last method allows use of Azure Active Directory, as used by Office 365. You do not have to require sign-in immediately, so you might defer it to the moment when the user takes an action such as making a purchase.
Once authenticated, your bot can access user details, subject to permission. This means you can get details including the user’s name, location, and email address. In the case of Azure Active Directory, your service can get access to the Microsoft Graph, a unified API into mainly Office 365 APIs including teams, calendar, documents and tasks.