Fight AI with AI! Code taught to finger naughty deepfake vids made by machine-learning algos

It works for now because the forgeries are quite easy to spot

The rise of AI systems that can generate fake images and videos has spurred researchers in the US to develop a technique to sniff out these cyber-shams, also known as deepfakes.

Generative Adversarial Networks (GANs) are commonly used for creative purposes. These neural networks have helped researchers create made-up data to train artificially intelligent software when there is a lack of training material, and has also assisted artists in creating portraits.

However, like anything tech-related, there is also a sinister side. The technology has been abused by miscreants to paste the faces of actresses, ex-girlfriends, politicians, and other victims, onto the bodies of porn stars. The result is fairly realistic, computer-generated video of people seemingly performing X-rated acts. The fear is this will go beyond fake smut, and into the realms of forged interviews and confessions, especially when combined with faked AI-generated audio.

Now, PhD student Yuezun Li and Siwei Lyu, an associate computer-science professor at the New York state university in Albany, have come up with a technique that attempts to identify deepfake videos, such as those crafted by the open-source DeepFake FaceSwap algorithm.

Deepfakes are, for now, not hard for humans to spot. The doctored videos are uncanny, the facial expressions aren’t very natural, and any motion is pretty laggy and glitchy. They also have a lower resolution than the source material. Thus, people should be able to realize they are being hoodwinked after more than a few seconds. However, as the technology improves, it would be nice if machines could be taught the tell-tale signs of these forgeries so as to alert unaware folks in future.

Detecting deepfakes

Previous attempts to make computers do the job looked at things like the way people blink in videos for signs of any shenanigans. This often required generating deepfakes with GANs first to train other neural network systems used in the detection process.

Li and Lyu’s method, however, doesn’t rely on GANs, and is therefore less time consuming and less computationally intensive. First, they used traditional computer vision techniques to detect faces in 24,442 training images, and extract the facial landmarks.

Next, they warped and twisted the facial features in the images to mimic the eerie effects often seen in deepfake vids. Finally, they trained convolutional neural networks (CNN) on the real and disfigured images to develop classifiers that could at least attempt to detect the probability of a scene being genuine or not. After training, screenshots from videos were then fed into these networks, which indicated whether the faces in the images are likely real or manipulated.

“Our method is based on the observations that current DeepFake algorithm can only generate images of limited resolutions, which need to be further warped to match the original faces in the source video,” they explained in a paper emitted this month.

"Such transforms leave distinctive artifacts in the resulting DeepFake videos, and we show that they can be effectively captured by convolutional neural networks."

The duo applied the aforementioned technique to four different CNNs. The training set contained 49 real videos and 49 DeepFake-generated videos. Each vid featured a single subject, and lasted for about 11 seconds. There were 32,752 frames in total.


The eyes don't have it! AI's 'deep-fake' vids surge ahead in realism


VGG16, an old CNN system developed by researchers at the University of Oxford in the UK, performed the worse at detecting deepfake images (with 83.3 per cent accuracy), compared to ResNet50 (97.4 per cent), a more popular CNN built by Microsoft researchers.

Other variants, including Microsoft's ResNet101 and ResNet152 came second (95.4 per cent) and third (93.8 per cent), respectively. For deepfake videos as a whole, ResNet101 was best (99.1 per cent), followed by ResNet50 (98.7 per cent), ResNet152 (97.8 per cent) and VGG16 came last (84.5 per cent).

Although promising, the researchers are yet to report meaningful results on deepfake videos and images beyond their carefully curated DeepFake dataset. More testing is needed on real-world forgeries, in other words. Plus, as the quality of GANs and fake content improves, it’ll become harder to detect forgeries using this method, we reckon.

“As the technology behind DeepFake keeps evolving, we will continuing improve the detection method," the academics noted. "First, we would like to evaluate and improve the robustness of our detection method with regards to multiple video compression.

"Second, we [are] currently using predesigned network structure[s] for this task (e.g, resnet or VGG), but for more efficient detection, we would like to explore dedicated network structure[s] for the detection of DeepFake videos." ®

Broader topics

Other stories you might like

  • Amazon can't channel the dead, but its deepfake voices take a close second
    Megacorp shows Alexa speaking like kid's deceased grandma

    In the latest episode of Black Mirror, a vast megacorp sells AI software that learns to mimic the voice of a deceased woman whose husband sits weeping over a smart speaker, listening to her dulcet tones.

    Only joking – it's Amazon, and this is real life. The experimental feature of the company's virtual assistant, Alexa, was announced at an Amazon conference in Las Vegas on Wednesday.

    Rohit Prasad, head scientist for Alexa AI, described the tech as a means to build trust between human and machine, enabling Alexa to "make the memories last" when "so many of us have lost someone we love" during the pandemic.

    Continue reading
  • Is computer vision the cure for school shootings? Likely not
    Gun-detecting AI outfits want to help while root causes need tackling

    Comment More than 250 mass shootings have occurred in the US so far this year, and AI advocates think they have the solution. Not gun control, but better tech, unsurprisingly.

    Machine-learning biz Kogniz announced on Tuesday it was adding a ready-to-deploy gun detection model to its computer-vision platform. The system, we're told, can detect guns seen by security cameras and send notifications to those at risk, notifying police, locking down buildings, and performing other security tasks. 

    In addition to spotting firearms, Kogniz uses its other computer-vision modules to notice unusual behavior, such as children sprinting down hallways or someone climbing in through a window, which could indicate an active shooter.

    Continue reading
  • Cerebras sets record for 'largest AI model' on a single chip
    Plus: Yandex releases 100-billion-parameter language model for free, and more

    In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate.

    "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."

    The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.

    Continue reading

Biting the hand that feeds IT © 1998–2022