2016. AI boffins picked a hell of a year to train a neural net by making it watch the news

Lipreading software must think us humans are maniacs

LipNet, the lipreading network developed by researchers at the University of Oxford and DeepMind, can now lipread from TV shows better than professional lipreaders.

The first LipNet paper, which is currently under review for International Conference on Learning Representations – ICLR 2017, a machine learning conference, was criticised for using a limited dataset to test LipNet’s accuracy. The GRID corpus is made up of sentences that have a strict word order and make no sense on their own.

The second paper released on arXiv, however, is a better indication of a machine’s superior lipreading abilities, as it tests the system on hours of speech movements from the speakers on the BBC News, Question Time, Breakfast and Newsnight UK TV shows.

It’s an “open-world problem,” where sentences are unconstrained in content and length, and representative of natural human speech.

The Watch, Listen, Attend and Spell (WLAS) network has a lower word accuracy rate than LipNet, at 46.8 per cent compared to 93.4 per cent. But it’s more complex and is dealing with a more difficult task.

It works by having image and audio encoders that focus on the mouth movements of a talking face and predict the characters being spoken, and a character decoder to spell out the words.

Researchers sampled the TV shows to create the Lip Reading Sentences (LRS) dataset for visual speech recognition. The WLAS network analysed the speech movements from the LRS dataset, which contains over 100,000 natural sentences and 17,428 words.

Top row: Original still images from the BBC lipreading dataset – News, Question Time, Breakfast, Newsnight (from left to right). Bottom row: The mouth motions for ‘afternoon’ from two different speakers. The network sees the areas inside the red squares. (Photo credit: University of Oxford and Google Deepmind)

WLAS still requires a lot of training, like LipNet, and only a small part of the LRS dataset is used to test the WLAS network. Out of the 17,428 words, 6,882 were used in the test set, but 6,253 had been previously encountered through the training and validation process.

The model was trained from scratch. When researchers tried to train the WLAS network on whole sentences, the learning rate was too slow and it was difficult to extract all the relevant information, so the sentences had to be broken down into single words.

As the machine learns more words over time, it can gradually piece them together to form sequences, and then the sentences which are in the dataset.

From lipreading only, the word error rate for WLAS at 53.2 per cent is considerably better than professional lipreaders at 73.8 per cent. Lipreading is a difficult task because of homophenes – words that sound different but look the same when spoken – it can be difficult to determine whether a word begins with a 'p' or a 'b'.

The system struggles to deal with noise, however, and when it has to translate from audio clips, the word error rate shoots up to 74.5 per cent. When audio and lipreading are added together, the word error rate goes down to 50.8 per cent.

A lipreading neural network has many advantages. The researchers are interested in developing better hearing aids – but maybe it could be used in more sinister ways, such as listening in on secret conversations on CCTV cameras. ®

Broader topics

Other stories you might like

  • Google keeps legacy G Suite alive and free for personal use

    Google has quietly dropped its demand that users of its free G Suite legacy edition cough up to continue enjoying custom email domains and cloudy productivity tools.

    This story starts in 2006 with the launch of “Google Apps for Your Domain”, a bundle of services that included email, a calendar, Google Talk, and a website building tool. Beta users were offered the service at no cost, complete with the ability to use a custom domain if users let Google handle their MX record.

    The service evolved over the years and added more services, and in 2020 Google rebranded its online productivity offering as “Workspace”. Beta users got most of the updated offerings at no cost.

    Continue reading
  • GNU Compiler Collection adds support for China's LoongArch CPU family
    MIPS...ish is on the march in the Middle Kingdom

    Version 12.1 of the GNU Compiler Collection (GCC) was released this month, and among its many changes is support for China's LoongArch processor architecture.

    The announcement of the release is here; the LoongArch port was accepted as recently as March.

    China's Academy of Sciences developed a family of MIPS-compatible microprocessors in the early 2000s. In 2010 the tech was spun out into a company callled Loongson Technology which today markets silicon under the brand "Godson". The company bills itself as working to develop technology that secures China and underpins its ability to innovate, a reflection of Beijing's believe that home-grown CPU architectures are critical to the nation's future.

    Continue reading
  • China’s COVID lockdowns bite e-commerce players
    CEO of e-tail market leader JD perhaps boldly points out wider economic impact of zero-virus stance

    The CEO of China’s top e-commerce company, JD, has pointed out the economic impact of China’s current COVID-19 lockdowns - and the news is not good.

    Speaking on the company’s Q1 2022 earnings call, JD Retail CEO Lei Xu said that the first two years of the COVID-19 pandemic had brought positive effects for many Chinese e-tailers as buyer behaviour shifted to online purchases.

    But Lei said the current lengthy and strict lockdowns in Shanghai and Beijing, plus shorter restrictions in other large cities, have started to bite all online businesses as well as their real-world counterparts.

    Continue reading
  • Foxconn forms JV to build chip fab in Malaysia
    Can't say when, where, nor price tag. Has promised 40k wafers a month at between 28nm and 40nm

    Taiwanese contract manufacturer to the stars Foxconn is to build a chip fabrication plant in Malaysia.

    The planned factory will emit 12-inch wafers, with process nodes ranging from 28 to 40nm, and will have a capacity of 40,000 wafers a month. By way of comparison, semiconductor-centric analyst house IC Insights rates global wafer capacity at 21 million a month, and Taiwanese TSMC’s four “gigafabs” can each crank out 250,000 wafers a month.

    In terms of production volume and technology, this Malaysian facility will not therefore catapult Foxconn into the ranks of leading chipmakers.

    Continue reading
  • NASA's InSight doomed as Mars dust coats solar panels
    The little lander that couldn't (any longer)

    The Martian InSight lander will no longer be able to function within months as dust continues to pile up on its solar panels, starving it of energy, NASA reported on Tuesday.

    Launched from Earth in 2018, the six-metre-wide machine's mission was sent to study the Red Planet below its surface. InSight is armed with a range of instruments, including a robotic arm, seismometer, and a soil temperature sensor. Astronomers figured the data would help them understand how the rocky cores of planets in the Solar System formed and evolved over time.

    "InSight has transformed our understanding of the interiors of rocky planets and set the stage for future missions," Lori Glaze, director of NASA's Planetary Science Division, said in a statement. "We can apply what we've learned about Mars' inner structure to Earth, the Moon, Venus, and even rocky planets in other solar systems."

    Continue reading
  • The ‘substantial contributions’ Intel has promised to boost RISC-V adoption
    With the benefit of maybe revitalizing the x86 giant’s foundry business

    Analysis Here's something that would have seemed outlandish only a few years ago: to help fuel Intel's future growth, the x86 giant has vowed to do what it can to make the open-source RISC-V ISA worthy of widespread adoption.

    In a presentation, an Intel representative shared some details of how the chipmaker plans to contribute to RISC-V as part of its bet that the instruction set architecture will fuel growth for its revitalized contract chip manufacturing business.

    While Intel invested in RISC-V chip designer SiFive in 2018, the semiconductor titan's intentions with RISC-V evolved last year when it revealed that the contract manufacturing business key to its comeback, Intel Foundry Services, would be willing to make chips compatible with x86, Arm, and RISC-V ISAs. The chipmaker then announced in February it joined RISC-V International, the ISA's governing body, and launched a $1 billion innovation fund that will support chip designers, including those making RISC-V components.

    Continue reading

Biting the hand that feeds IT © 1998–2022