This article is more than 1 year old

Don't believe the hype that AI-generated 'master faces' can break into face recognition systems any time soon

The machine learning model was trained and tested on limited data

Analysis The idea of so-called “master faces,” a set of fake images generated by machine learning algorithms to crack into facial biometric systems by impersonating people, made splashy headlines last week. But a closer look at the research reveals clear weaknesses that make it unlikely to work in the real world.

“A master face is a face image that passes face-based identity-authentication for a large portion of the population,” the paper, released on arXiv earlier this month, explained. “These faces can be used to impersonate, with a high probability of success, any user, without having access to any user-information.”

The trio of academics from Tel Aviv University go on to say they built a model that generated nine master faces capable of representing 40 per cent of the population that bypassed “three leading deep face recognition systems.” At first glance, it seems impressive and the claims pose clear security risks in applications that require facial identification.

First, the team employed Nvidia’s StyleGAN system to create realistic-looking images of made-up faces. Each fake output was compared to one real photograph of the 5,749 different people represented in the Labeled Faces in the Wild (LFW) dataset. A separate classifier algorithm determines how similar the fake AI-generated faces look compared to the real ones in the dataset.

Images that score highly for similarity by the classifier are kept, and the others are discarded. These scores are used to train an evolutionary algorithm to create more and more spoof faces using StyleGAN that look like the people in the dataset.

Over time, the researchers are able to find a set of master faces that represent as many of the images they can in the dataset. In short, they were able to come up with just nine images to represent 40 per cent of the 5,749 different people in the Labeled Faces in the Wild dataset.

Next, they used these master faces to spoof three face different facial recognition models: Dlib, FaceNet, and SphereFace. These systems ranked most highly in the contest that benchmarks the best face matching algorithms tested on the LFW dataset.

A quick look at the highest-scoring master faces capable of bypassing each of the three models, however, shows a clear limitation in the research. They’re pretty much all fake images of older Caucasian men, donning white hair, glasses, and mustaches. If these same types of images are able to represent a large population of the LFW dataset then surely the dataset must be somewhat flawed.


The best master face that was able to trick Dlib (left), FaceNet (middle), and SphereFace (right). Taken from Figure 4 in the paper.

Garbage in, garbage out

A disclaimer posted on the website hosting the dataset confirms this: “Many groups are not well represented in LFW. For example, there are very few children, no babies, very few people over the age of 80, and a relatively small proportion of women. In addition, many ethnicities have very minor representation or none at all.”

The scores of the nine master faces reflect the limitations of the LFW dataset. Faces that are female, darker in skin tone, and younger are ranked lower and less likely to bypass the three models that were tested.


The nine master faces that represent 40 per cent of the LFW dataset. Notice how the scores are lower for people who are younger, female, or have darker skin tones. Taken from Figure 5 of the paper.

“While theoretically LFW could be used to assess performance for certain subgroups, the database was not designed to have enough data for strong statistical conclusions about subgroups. Simply put, LFW is not large enough to provide evidence that a particular piece of software has been thoroughly tested,” according to another disclaimer listed on the LFW’s website.

Although the idea of master faces capable of impersonating a vast proportion of peoples faces to unlock face recognition systems is interesting, the research here is just another case of a machine learning model trained and tested using flawed data. Garbage in, garbage out, as they say.

There is a lack of diversity in the LFW dataset, so the computer-generated master faces are more likely to cover a larger proportion of that dataset. It’s unlikely that these images would work as well in the real world.

And no real-world tests

“LFW indeed suffers from the limitations described in its official website, but in spite of these limitations, LFW is a widely used dataset in the academic literature for evaluating face recognition methods,” Tomer Friedlander, co-author of the paper and a researcher at the School of Electrical Engineering at Tel Aviv University, told The Register.

“Our paper presents a possible vulnerability of face recognition systems, which can be exploited by attackers. Therefore, it should be taken into consideration by both developers and users of face recognition methods. We have not tested our method against commercial face recognition systems, which are used in real life, so we cannot refer to systems in real life.”

It’s possible to adapt the model to better datasets that are more diverse to try and trick systems in the real world, he said. “We are interested in further exploring the possibility of using the master faces generated by our method in order to help protect existing facial recognition systems from such attacks. We leave this for future research.”

Don’t fall for the scaremongering headlines claiming these master faces can break into “over 40 per cent of facial ID authentication systems” or that they’re “wildly successful”. There’s little evidence to support those claims.

Friedlander told us the paper has been accepted into this year’s IEEE International Conference on Automatic Face & Gesture Recognition conference to be held in December. ®

More about


Send us news

Other stories you might like