Voice control is all the rage these days, but a team of Chinese researchers has come up with a way to subvert such systems by taking a trick from the natural world.
Apps like Google Assistant and Siri are set to always be listening and ready for action, but shouting into someone else's phone is hardly subtle. So the team from Zhejiang University decided to take a standard voice command, convert it into the ultrasonic range so humans can't hear it, and see if the device could.
The method [PDF], dubbed DolphinAttack, takes advantage of the fact that puny human ears can't hear sounds well above 20kHz. So the team added an amplifier, ultrasonic transducer and battery to a regular smartphone (total cost in parts around $3) and used it to send ultrasonic commands to voice-activated systems.
Very little kit can have a big effect
"By leveraging the nonlinearity of the microphone circuits, the modulated low-frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems," they said.
"We validate DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa." The in-car navigation system in Audi cars was also vulnerable in this way.
Because voice control has lots of possible functions, the team was able to order an iPhone to dial a specific number – which is handy but not that useful as an attack. But they could also instruct a device to visit a specific website – which could be loaded up with malware – dim the screen and volume to hide the assault, or just take the device offline by putting it in airplane mode.
The biggest brake on the attack isn't down to the voice command software itself, but the audio capabilities of the device. Many smartphones now have multiple microphones, which makes an assault much more effective.
As for range, the furthest distance the team managed to make the attack work at was 170cm (5.5ft), which is certainly practical. Typically the signal was sent out at between 25 and 39kHz.
The full research will be presented at the ACM Conference on Computer and Communications Security next month in Dallas, Texas. ®