Deepfake 3.0 (beta), the bad news: This AI can turn ONE photo of you into a talking head. Good news: There is none

Watch Rasputin sing like, uh, Beyonce, Einstein natter away...


Videos Remember that artificially intelligent software that could transform lifeless still images, such as portrait paintings, into moving heads? Well, you can now take a single photo or picture of someone and animate it to make them say specific words and sentences, using AI algorithms.

This machine-learning code can take a person's mouth and lip-synch it to a given spoken-word audio track, effectively forcing the subject to speak the supplied recording and say things they never actually uttered. The ways in which this could be abused to trick audiences are endless.

This new development, like the research preceding it, feeds into the hand-wringing frenzy over deepfakes, a term that describes content, whether it be images, videos, or audio, that has been doctored and twisted by machine-learning algorithms.

The internet freaked out over portraits of Mona Lisa and photos of dead celebrities like Marilyn Monroe suddenly coming to life, reanimated by the cold clammy hands of neural networks and code. Their eyes blinked, and their mouths moved, but no sound came out.

Now, researchers at the Samsung AI Center, and Imperial College London in the United Kingdom, have gone one step further. They have created fake talking heads that really can speak. Listen to Einstein discussing the wonders of science below. Yes, it’s his face and his voice, but it’s still fake, and clearly fake, nevertheless.

Youtube Video

The audio was sourced from a recording of a speech by the E-mc2 super-boffin, and his face is from a photograph. Here’s one that’s more obviously bogus: it’s a photograph of Grigori Rasputin singing popstar Beyonce’s smash hit Halo...

Youtube Video

The images are pretty grainy, obviously manipulated in some way, and they’re amusing enough to not really be taken seriously. However, here’s another clip that shows why this type of technology is potentially dangerous:

Youtube Video

Normal people like you or me can therefore be visually manipulated, and the doctoring is not always obvious. In the video above, people's faces are animated by the AI software to repeat neutral sentences such as “it’s eleven o’clock” or “I’m on my way to the meeting” with a range of facial expressions, from happy and sad to scared.

Right now, these videos, produced as a result of early academic research, are impressive from a technical standpoint, though ultimately not always entirely convincing.

However, imagine a future in which these fake computer-crafted videos are good enough to fool enough of the population to spread fake news, or doctor evidence to frame people for crimes they haven’t committed – all automatically at the press of a few buttons.

Generators and discriminators

As we've said, the output of the technology described in the team's arXiv paper, emitted this month, is isn’t entirely convincing yet. The resulting video footage is low quality, and lacks small facial movements and features such as the small wrinkles that pool around the nose and lips when real people natter away. The eyes are also lacklustre.

However, considering that the model can create a talking head from just a single input image and audio file, it’s not too bad at this stage. The researchers built the software on top of a generative adversarial network (GAN) that featured one generator and three discriminator networks. This approach pitted the generator against the trio of discriminators: the generator has to produce streams of material, from input pictures and audio, that is convincing enough to get past the discriminators.

The discriminators therefore had to be taught to differentiate between real and fake videos “based on the synchrony or the presence of natural facial expressions,” according to the paper. A total 164,109 samples taken from four datasets of people speaking were used to train the model, and 17,753 clips were used for testing.

deepfake_network

A diagram of the different components in the model ... Image credit: Vougioukas et al.

During training, the generator took a still input picture and an audio clip, and from these two sources outputted a series of frames derived from that input snap, with each frame corresponding to a 0.2-second snippet from the input audio. In each frame, the mouth and face were slightly altered to match the associated brief audio sample.

Those frames were then passed into two of the discriminators, which checked the audio and lip movements were aligned; if not, the stream was rejected as fake or unrealistic, with feedback passed to the generator so that it can improve. The third sequence discriminator looked at the video as a whole to see if the transitions between each frame were smooth so that the generated clip looked realistic; if not, it was rejected, and the generator informed.

Once training is complete, the GAN should be good enough to take any input image and audio and synch them up into a deepfake talking head video.

The still images and audio fed into the generator were encoded by two separate convolutional neural networks. To top it all off, there was also a noise generator in the mix to generate filler frames containing eye blinking and other facial motions.

“Our model is implemented in PyTorch and takes approximately a week to train using a single Nvidia GeForce GTX 1080 Ti GPU,” the researchers wrote in their paper. Fake clips of talking heads can be created in real time: a video containing about 75 frames can be generated in just half a second using a GTX 1080 Ti GPU, though it takes longer if a CPU is used.

When the researchers asked 66 people to watch 24 videos – 12 are real, and 12 deepfakes – people could only label them as real or fake correctly about 52 per cent of the time. “This model has shown promising results in generating lifelike videos, which produce facial expressions that reflect the speakers tone. The inability of users to distinguish the synthesized videos from the real ones in the Turing test verifies that the videos produced look natural,” the researchers concluded.

They hope to make their results more convincing in future with more realistic movements. At the moment, for instance, the fake talking heads can’t really move their heads much. ®

Broader topics


Other stories you might like

  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading
  • Big Tech loves talking up privacy – while trying to kill privacy legislation
    Study claims Amazon, Apple, Google, Meta, Microsoft work to derail data rules

    Amazon, Apple, Google, Meta, and Microsoft often support privacy in public statements, but behind the scenes they've been working through some common organizations to weaken or kill privacy legislation in US states.

    That's according to a report this week from news non-profit The Markup, which said the corporations hire lobbyists from the same few groups and law firms to defang or drown state privacy bills.

    The report examined 31 states when state legislatures were considering privacy legislation and identified 445 lobbyists and lobbying firms working on behalf of Amazon, Apple, Google, Meta, and Microsoft, along with industry groups like TechNet and the State Privacy and Security Coalition.

    Continue reading
  • SEC probes Musk for not properly disclosing Twitter stake
    Meanwhile, social network's board rejects resignation of one its directors

    America's financial watchdog is investigating whether Elon Musk adequately disclosed his purchase of Twitter shares last month, just as his bid to take over the social media company hangs in the balance. 

    A letter [PDF] from the SEC addressed to the tech billionaire said he "[did] not appear" to have filed the proper form detailing his 9.2 percent stake in Twitter "required 10 days from the date of acquisition," and asked him to provide more information. Musk's shares made him one of Twitter's largest shareholders. The letter is dated April 4, and was shared this week by the regulator.

    Musk quickly moved to try and buy the whole company outright in a deal initially worth over $44 billion. Musk sold a chunk of his shares in Tesla worth $8.4 billion and bagged another $7.14 billion from investors to help finance the $21 billion he promised to put forward for the deal. The remaining $25.5 billion bill was secured via debt financing by Morgan Stanley, Bank of America, Barclays, and others. But the takeover is not going smoothly.

    Continue reading

Biting the hand that feeds IT © 1998–2022