Eavesdropping appliances like Amazon Echo and software assistants like Google Now can be attacked using mangled words that get interpreted as commands, but humans hear as nonsense.
As explained in a 2015 paper [PDF], the phrase "Cocaine Noodles," for example, can be heard by Google Now as its command invocation, "OK, Google."
But that's the sort of phrase nearby people might notice and wonder about. It's not a particularly subtle mode of attack.
More recently, two Princeton University researchers – assistant professor Prateek Mittal and graduate student Liwei Song – developed a covert way to address attentive gear.
In a paper published on Thursday titled "Inaudible Voice Commands" [PDF], the tech boffins describe a technique for playing commands that people cannot hear but devices can.
By playing back voice commands at a frequency outside of the range of human hearing, the two IoT whisperers managed to direct Google Now, running on a Nexus 5X with Android 7.1.2, to take a picture and to enable Airplane Mode.
They also inaudibly asked Alexa, the Amazon Echo software agent, to add milk to a shopping list and to speak the current weather conditions.
Despite missing the opportunity to secretly order "Cocaine Noodles" from Amazon.com, the pair achieved success rates of 100 per cent at three metres with the Android phone, and 80 per cent at a distance of two metres with the Amazon Echo.
The attack technique takes advantage of the fact that microphones create new frequencies through signal distortion, a consequence of nonlinear audio processing.
"The adversary plays an ultrasound signal with spectrum above 20 kHz, which is inaudible to humans," the paper explains. "Then the victim device's microphone processes this input, but suffers from nonlinearity, causing the introduction of new frequencies in the audible spectrum. With careful design of the original ultrasound, these new audible frequencies recorded by the microphone are interpreted as actionable commands by voice assistant software."
Device whispering has some limitations. Song, in an email to The Register, said the technique has not been tested on devices trained to recognize specific voices. "However, if we can obtain a recording file of a device owner's voice, we think our attack method still works," he said.
Also, the attack was conducted with a dedicated speaker – not the sort of thing one can sneak into a room easily – and it hasn't been demonstrated using a mobile phone as a sound source. "It is an open question of using phones for attacking, since many phones cannot transmit high-frequency ultrasounds," said Song.
There's a video, for those who go for that sort of thing. ®