This article is more than 1 year old
Facebook, academics think they've cracked spotting deepfakes by spotting how they're generated
99% accuracy in testing
Computer scientists have built prototype software capable of not only detecting fake images forged by neural networks but also estimating the properties of the model used to generate these so-called deepfakes.
The project is a collaboration between academics at Michigan State University (MSU) in America and Facebook AI Research (FAIR), the aim being to make a tool that can tackle campaigns of coordinated disinformation and the like.
Typical machine-learning models trained to sniff out deepfakes work by predicting whether, say, a given image was electronically manipulated. The MSU-FAIR system, however, goes deeper, and can suggest the architecture of the neural network used to create the images, which could be useful for grouping together images and attributing them to a particular disinformation campaign. In other words, you could take 100,000 images and, using the MSU-FAIR code, determine 2,000 of them were created by someone with a model of type A, and 3,000 were created by another group using model type B, etc.
“Our reverse engineering method relies on uncovering the unique patterns behind the AI model used to generate a single deepfake image,” said Tal Hassner, a FAIR researcher and co-author of a research paper describing the system.
“We can estimate properties of the generative models used to create each deepfake, and even associate multiple deepfakes to the model that possibly produced them. This provides information about each deepfake, even ones where no prior information existed.”
The software could be used to root out online accounts created automatically, with deepfake avatar pictures, to post made-up positive or negative reviews for products, for example.
The system should, in theory, be able to detect that they’re fake images and if they were generated using the same generative adversarial network (GAN) or not. The model could then give developers a good idea of the exact type of GAN used to create the deepfakes, or if the GAN is an unknown system.
- US Senate approves deepfake bill to defend against manipulated media
- Images of women coerced by adult companies poison dataset popularised by deepfake smut creators
- South Park creators have a new political satire series with some of the best AI-generated deepfakes on the internet yet
- 51 years after humans first set foot on the Moon, a deepfaked Nixon mourns how Armstrong and Aldrin never made it home
The reverse engineering technique relies on two systems: a Fingerprint Estimation Network (FEN) and a Parsing Network (PN). First, the FEN processes an image and looks for hidden patterns that indicate the photo is a computer-generated fake. There might be a strange group of noisy pixels that appear in multiple images, for example, that give away the origin of the snaps.
Next, the PN analyses these hidden watermark-like signals to predict the number of layers in the deepfake neural network and how they might be connected. Finally, the output from the FEN is fed into a binary classifier to determine if a picture was computer generated or not.
The prototype model was trained on a dataset of 100,000 deepfakes produced by 100 GAN models.
By being able to identify synthetic images by their digital fingerprints, the system is better at detecting new types of false images that it hasn’t necessarily been trained on. “Overfitting is a huge issue in any deep model training,” Xioming Liu, co-author of the research and a professor at the department of computer science and engineering at MSU, told The Register.
“For example, a traditional binary deepfake detector could potentially use other information such as overall lighting that has little to do with deepfake the for achieving 100 per cent classification on the training set, and this model would generalize poorly when tested on a testing set.
“One way to remedy overfitting is to provide more supervision signals, so that the training is more likely to pick up the generalizable and truthful signals important to the deepfake. In our case, since different fingerprints or high-frequency noise on GAN generated images are the results of different network architecture and loss functions, providing those additional ground truth [details] in training will likely push the network to learn a fingerprint that is faithful to a specific [generative model]. Thus we will also be able to learn the more truthful boundary among the various GMs and between real and fakes.”
When the new system was tested on two academic datasets, it was able to accurately detect deepfakes with over 99 per cent accuracy. A better examination of the model’s abilities, though, would be to see if it can not only identify counterfeit images among real ones, but also reliably cluster the fake pictures by predicting which GANs were used to generate them.
“This is just a beginning of a new research direction,” Liu told us. “We are constantly brainstorming on various paths to improve our model, both in terms of its prediction accuracy, and the applications enabled by our model. We are also in discussion with Facebook on how to improve the model so that it could be more applicable to real-world applications.”
The code for the model has been released on GitHub. ®