Is that you, HAL? AI can now see secrets through lipreading – kinda

LipNet's got potential but also a loooong way to go


AI surveillance could be about to get a lot more advanced, as researchers move on from using neural networks for facial recognition to lipreading.

A paper submitted by researchers from the University of Oxford, Google DeepMind and the Canadian Institute for Advanced Research is under review for ICLR 2017 (Conference on Learning Representations), an academic conference for machine learning, and describes a neural network called “LipNet.”

LipNet can decipher what words have been spoken by analyzing the “spatiotemporal visual features” of someone speaking on video to 93.4 per cent accuracy – beating professional human lipreaders.

It’s the first model that works beyond simple word classification to use sentence-level sequence prediction, the researchers claimed.

Lipreading is a difficult task, even for people with hearing loss, who score an average accuracy rate of 52.3 per cent.

“Machine lipreaders have enormous practical potential, with applications in improved hearing aids, silent dictation in public spaces, covert conversations, speech recognition in noisy environments, biometric identification, and silent-movie processing,” the paper said.

But, for those afraid of CCTV cameras reading into secret conversations, don’t start throwing away the funky pixel-distorting glasses that can mask your identity yet.

A closer look at the paper reveals that the impressive accuracy rate only covers a limited dataset of words strung together into sentences that often make no sense, like in the example used in the video below.

Youtube Video

The GRID corpus is a series of audio and video recordings of 34 speakers who speak 1,000 sentences each. The sentences all have a structure of the following “simple grammar”: command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4).

The number in the brackets shows the number of word choices for each category, giving 64,000 possible sentences that can be spoken. Many files were missing or corrupted from the GRID corpus, leaving 32,839 videos from 13 speakers.

LipNet needs a lot of training to work to such a high accuracy. From the total number of videos, roughly 88 per cent were used for training and 12 per cent for testing. It focuses on the various shapes that the speaker’s mouth makes as he or she talks, and breaks it down into image frames.

These are then fed into the neural network as input, and passes over several layers to map the mouth movements into phonemes, to work out the words and sentences phonetically.

LipNet mapping frames into phonemes and words (Photo credit: Assael et al)

It’s a long way off before LipNet is able to handle real, normal conversations between two people. The system will require a ton more data for training to deal with accents and different languages.

But if you’re still worried about cameras deciphering your whispers, maybe wear a mask. ®


Other stories you might like

  • DigitalOcean tries to take sting out of price hike with $4 VM
    Cloud biz says it is reacting to customer mix largely shifting from lone devs to SMEs

    DigitalOcean attempted to lessen the sting of higher prices this week by announcing a cut-rate instance aimed at developers and hobbyists.

    The $4-a-month droplet — what the infrastructure-as-a-service outfit calls its virtual machines — pairs a single virtual CPU with 512 MB of memory, 10 GB of SSD storage, and 500 GB a month in network bandwidth.

    The launch comes as DigitalOcean plans a sweeping price hike across much of its product portfolio, effective July 1. On the low-end, most instances will see pricing increase between $1 and $16 a month, but on the high-end, some products will see increases of as much as $120 in the case of DigitalOceans’ top-tier storage-optimized virtual machines.

    Continue reading
  • GPL legal battle: Vizio told by judge it will have to answer breach-of-contract claims
    Fine-print crucially deemed contractual agreement as well as copyright license in smartTV source-code case

    The Software Freedom Conservancy (SFC) has won a significant legal victory in its ongoing effort to force Vizio to publish the source code of its SmartCast TV software, which is said to contain GPLv2 and LGPLv2.1 copyleft-licensed components.

    SFC sued Vizio, claiming it was in breach of contract by failing to obey the terms of the GPLv2 and LGPLv2.1 licenses that require source code to be made public when certain conditions are met, and sought declaratory relief on behalf of Vizio TV owners. SFC wanted its breach-of-contract arguments to be heard by the Orange County Superior Court in California, though Vizio kicked the matter up to the district court level in central California where it hoped to avoid the contract issue and defend its corner using just federal copyright law.

    On Friday, Federal District Judge Josephine Staton sided with SFC and granted its motion to send its lawsuit back to superior court. To do so, Judge Staton had to decide whether or not the federal Copyright Act preempted the SFC's breach-of-contract allegations; in the end, she decided it didn't.

    Continue reading
  • US brings first-of-its-kind criminal charges of Bitcoin-based sanctions-busting
    Citizen allegedly moved $10m-plus in BTC into banned nation

    US prosecutors have accused an American citizen of illegally funneling more than $10 million in Bitcoin into an economically sanctioned country.

    It's said the resulting criminal charges of sanctions busting through the use of cryptocurrency are the first of their kind to be brought in the US.

    Under the United States' International Emergency Economic Powers Act (IEEA), it is illegal for a citizen or institution within the US to transfer funds, directly or indirectly, to a sanctioned country, such as Iran, Cuba, North Korea, or Russia. If there is evidence the IEEA was willfully violated, a criminal case should follow. If an individual or financial exchange was unwittingly involved in evading sanctions, they may be subject to civil action. 

    Continue reading

Biting the hand that feeds IT © 1998–2022