Alexa Conversations: Amazon's AI assistant is about to get a whole lot more like Clippy

Hi! It looks like you're planning a night out! Would you like a taxi with that?

re:MARS At Amazon's AI event in Las Vegas this week, the company introduced Alexa Conversations, a new way to code skills that support more natural conversation and participate in multi-topic interactions.

Alexa Skills are third-party extensions to Amazon's chat bot. Developers register their own wake word or custom phrase – so, for example, you could create a Reg news skill invoked by something like "Alexa, ask El Reg for today's tech news" and get back some content.

Vendors can also use skills to sell their stuff. If you can persuade a user to authorise your skill for Amazon Pay, "customers can pay for physical goods and services in your skill using the information already in their Amazon account – without needing to remember their username and password".

The challenge with writing these things is all about parsing human burblings into a simplistic set of intents (supported actions) along with "slots", in Alexa jargon, which specify the target of the intent. It is not too bad with something relatively simple like controlling music – with intents like play, stop and control volume, and slots like tracks, artists and albums. Anything more sophisticated is hard to code and arduous for the user, which is why those customer service bots that attempt to triage your support call are so frustrating; we take every possible shortcut to get to an actual human.

Dialogue management in an Alexa Skill is about writing logical flows that get the required information to identify and confirm intents and entities. Once you have ascertained that the user requires a taxi, for example, you would want to identify the destination, time, number of travellers, and ensure that, say, Leeds Castle in Kent is not confused with a castle in Leeds.

That was then. Now Alexa Conversations promises to replace fixed dialogue flows with something smarter.

"Alexa Conversations combines an AI-driven dialog manager with an advanced dialog simulation engine that automatically generates synthetic training data," stated Amazon. "You provide API(s), annotated sample dialogs that include the prompts that you want Alexa to say to the customer, and the actions you expect the customer to take. Alexa Conversations uses this information to generate dialog flows and variations, learning the large number of paths that the dialogs could take."

This is a classic AI approach. Instead of trying to code for every situation, you define the model and the goals, provide examples, and let machine learning algorithms figure out what to do – in this case using a "recurrent neural network for modeling dialog flow".

"The Atom Tickets skill built with Alexa Conversations shrank almost 70 per cent, to just 1,700 lines or code, and needed only 13 customer dialog samples," said Amazon's storage maketing bod Drew Meyer.

Another potentially more significant part of the announcement was that Alexa dialogues will be able to span multiple skills. Theatre booking agencies don't tend to run taxis, but customers often require both. Therefore, it makes sense for Alexa to combine skills so that when the user says, "Book a ticket for the 7pm performance and get me a taxi for it," Alexa could invoke skills from two different businesses. Devs will need to support new APIs that enable their skill to participate in these multi-skill interactions.

Multi-skill converstations handled by Alexa

Multi-skill conversations handled by Alexa (click to enlarge)

"We envision a world where customers will converse more naturally with Alexa: seamlessly transitioning between skills, asking questions, making choices, and speaking the same way they would with a friend, family member, or co-worker," said Amazon's Rohit Prasad, Alexa VP and head scientist.

What's wrong with this picture? A couple of things. The first is that the nuances of human conversation, little signals that show approval or disapproval, and the way our butterfly minds flit from one angle to another without pause for thought, are challenging for bot engines to track. Do not expect Prasad's vision to be reality any time soon.

But that may be a good thing. The consequences for competition and choice if we take to purchasing stuff via voice assistants may be severe, and that is even before considering the privacy aspect of letting these things into our lives.

The problem is that voice assistants are so annoying to use that developers try to minimise the extent of confirmatory choices. Therefore, it is winner takes all for whatever Alexa comes up with when you order something generic like travel, everyday home items or local services like a plumber or electrician. You will not see the equivalent of the 10 blue links offered by search engines, which at least give some semblance of choice.

Which skill will Alexa choose in a multi-skill dialogue? Skill choice optimisation could be the next SEO as competing vendors fight desperately for Alexa's attention. It may be bad news for the best-known brands since with no visuals or advertising blurb it is harder to get customers to pay a premium. Amazon's increasing range of own-brand goods could be a winner here.

All this is speculative, but be in no doubt that the more this stuff takes off, the more intense will be the discussion around ethics and competition – probably after it has all gone wrong. ®

Biting the hand that feeds IT © 1998–2021