The rise of AI systems that can generate fake images and videos has spurred researchers in the US to develop a technique to sniff out these cyber-shams, also known as deepfakes.
Generative Adversarial Networks (GANs) are commonly used for creative purposes. These neural networks have helped researchers create made-up data to train artificially intelligent software when there is a lack of training material, and has also assisted artists in creating portraits.
However, like anything tech-related, there is also a sinister side. The technology has been abused by miscreants to paste the faces of actresses, ex-girlfriends, politicians, and other victims, onto the bodies of porn stars. The result is fairly realistic, computer-generated video of people seemingly performing X-rated acts. The fear is this will go beyond fake smut, and into the realms of forged interviews and confessions, especially when combined with faked AI-generated audio.
Now, PhD student Yuezun Li and Siwei Lyu, an associate computer-science professor at the New York state university in Albany, have come up with a technique that attempts to identify deepfake videos, such as those crafted by the open-source DeepFake FaceSwap algorithm.
Deepfakes are, for now, not hard for humans to spot. The doctored videos are uncanny, the facial expressions aren’t very natural, and any motion is pretty laggy and glitchy. They also have a lower resolution than the source material. Thus, people should be able to realize they are being hoodwinked after more than a few seconds. However, as the technology improves, it would be nice if machines could be taught the tell-tale signs of these forgeries so as to alert unaware folks in future.
Previous attempts to make computers do the job looked at things like the way people blink in videos for signs of any shenanigans. This often required generating deepfakes with GANs first to train other neural network systems used in the detection process.
Li and Lyu’s method, however, doesn’t rely on GANs, and is therefore less time consuming and less computationally intensive. First, they used traditional computer vision techniques to detect faces in 24,442 training images, and extract the facial landmarks.
Next, they warped and twisted the facial features in the images to mimic the eerie effects often seen in deepfake vids. Finally, they trained convolutional neural networks (CNN) on the real and disfigured images to develop classifiers that could at least attempt to detect the probability of a scene being genuine or not. After training, screenshots from videos were then fed into these networks, which indicated whether the faces in the images are likely real or manipulated.
“Our method is based on the observations that current DeepFake algorithm can only generate images of limited resolutions, which need to be further warped to match the original faces in the source video,” they explained in a paper emitted this month.
"Such transforms leave distinctive artifacts in the resulting DeepFake videos, and we show that they can be effectively captured by convolutional neural networks."
The duo applied the aforementioned technique to four different CNNs. The training set contained 49 real videos and 49 DeepFake-generated videos. Each vid featured a single subject, and lasted for about 11 seconds. There were 32,752 frames in total.
The eyes don't have it! AI's 'deep-fake' vids surge ahead in realismREAD MORE
VGG16, an old CNN system developed by researchers at the University of Oxford in the UK, performed the worse at detecting deepfake images (with 83.3 per cent accuracy), compared to ResNet50 (97.4 per cent), a more popular CNN built by Microsoft researchers.
Other variants, including Microsoft's ResNet101 and ResNet152 came second (95.4 per cent) and third (93.8 per cent), respectively. For deepfake videos as a whole, ResNet101 was best (99.1 per cent), followed by ResNet50 (98.7 per cent), ResNet152 (97.8 per cent) and VGG16 came last (84.5 per cent).
Although promising, the researchers are yet to report meaningful results on deepfake videos and images beyond their carefully curated DeepFake dataset. More testing is needed on real-world forgeries, in other words. Plus, as the quality of GANs and fake content improves, it’ll become harder to detect forgeries using this method, we reckon.
“As the technology behind DeepFake keeps evolving, we will continuing improve the detection method," the academics noted. "First, we would like to evaluate and improve the robustness of our detection method with regards to multiple video compression.
"Second, we [are] currently using predesigned network structure[s] for this task (e.g, resnet or VGG), but for more efficient detection, we would like to explore dedicated network structure[s] for the detection of DeepFake videos." ®