Google has teased new bot technology aimed mainly at contact centres as part of its Cloud AI week in the seemingly unending Cloud Next OnAir videofest.
This includes the ability to train text-to-speech on a human voice so that you could create recordings that sound as if a specific person spoke them – a technology with obvious potential for abuse, though the company said all customers requesting this will be subject to ethical review.
Anthony Passemard, head of product for conversational AI, spoke at the virtual event on "Contact Center AI (CCAI) State of the union". He claimed that 31 per cent of CIOs have adopted "conversational AI platforms", which is itself a rise of 50 per cent year-on-year.
He also maintained that "by 2023, customers will prefer to use speech interfaces to initiate 70 per cent of self-service customer interactions, rising from 40 per cent of in 2019", and that 40 per cent of contact centre interactions will be "fully automated by using AI, machine learning and self-service", up from 25 per cent in 2018. The figures quoted were from Gartner.
Contact Center AI is based on parsing voice input, bot-driven conversations, AI-driven assistance for human agents, and analytics after the event
The enthusiasm of companies to pay fewer human agents in call centres is understandable, though the extent to which their customers would welcome this is less clear. Actions like, for example, paying a bill are easy to automate, though most people use the web for this, and old-school number menus ("Press 1 for this, 2 for that") work equally well, or badly, in many cases. More gnarly problems are rarely solved via automated interaction, though Google argued that its technology can perform useful triage as well as guiding human agents after the call has been transferred.
CCAI is composed of three Google products. DialogFlow is a developer tool for building conversational AI applications, and at Cloud Next Passemard introduced DialogFlow CX, currently in beta, which he called "the next evolution" of the service. General availability is expected in a few months. New features in DialogFlow CX include a revamped user interface with a visual flow builder, up to 20,000 intents per agent, native support for sentiment (how angry you are), and for Custom Voice, these text-to-speech voices that customers themselves train. DialogFlow CX also supports a DevOps pipeline with features including A/B testing of bots and intents.
Custom Voice lets customers make text to speech sound like a specific person after just 30 minutes of training
"Intents" are the pre-defined actions that the AI parses from voice input, such as "I'd like to pay my bill" or perhaps "why have I been charged twice?"
Google's speech-to-text technology is used by its Assistant and Google Home services as well as CCAI so it is an area of significant investment. The company's presentation claimed a vocabulary 10 times the size of the Oxford English Dictionary, 120 languages, and real-time text conversion with after-the-fact correction based on context.
Just introduced is Speech-to-Text On-Prem, which runs in a company's data centre via Anthos, Google's hybrid cloud platform. Using Anthos, customers can have the data and processing remain on-premises, while still managing it from a cloud console, perhaps satisfying some regulatory requirements.
Agent Assist is AI-driven assistance for human agents (computer says no?) and provides them with a live transcription of all calls (following, no doubt, the standard warning that "this call may be recorded"), as well as searching for suitable knowledgebase articles or proposing related products for a sales pitch. The news here is that Google now offers Agent Assist for chat as well as voice, meaning the live chat option you now find on many customer support sites.
Passemard also introduced the alpha of CCAI Insights, a service for mining the data from calls so that managers or executives can analyse what people are calling about and the quality of their experience. The AI will analyse sentiment, not relying solely on the inevitable plea for customers to rate the service at the end of a call. The idea is that companies can find the worst cases and follow them up.
Text to speech is another key part of the service. It is easy to make an electronic voice read text, but not so easy to make it sound convincingly human, and this is where it is evolving. There are now 223 preset voices and 92 WaveNet voices, a premium service promising "human-like emphasis and inflection on syllables, phonemes and words". WaveNet voices are also used by Google Assistant.
The newly introduced Custom Voice allows customers to use a "voice actor" to train a bot to speak like the actor. Passemard demonstrated this capability by playing a recording of an actor, and then a digital voice, based apparently on just 30 minutes of recordings of the actor's voice, which sounded plausibly similar.
The company is aware, said Passemard, that people should "not be using the voice of an actor to do things that are not ethical". The technology, he said, "is available for any customer, following an ethics review". In an era of disinformation there is abundant potential for misuse, when applied for example to politics or social media manipulation, but Google's criteria for ethical use and how it will prevent misuse was not stated. That said, the ability to manipulate audio is nothing new and Custom Voice only makes this easier.
"We're not claiming we're going to be able to replace humans," said Passemard. "That's not the case. But we want to be close enough that we advise our customers to let people know they are talking to a bot, and not a human agent." ®