AI systems can now create images of humans that are so lifelike they look like photographs, except the people in them don’t really exist.
See for yourself. Each picture below is an output produced by a generative adversarial network (GAN), a system made up of two different networks including a generator and a discriminator. Developers have used GANs to create everything from artwork to dental crowns.
Some of the images created from Nvidia's style transfer GAN. Image credit: Karras et al. and Nvidia
The performance of a GAN is often tied to how realistic its results are. What started out as tiny, blurry, greyscale images of human faces four years ago, has since morphed into full colour portraits.
Early results from when the idea of GANs were first introduced. Image credit: Goodfellow et al.
The new GAN built by Nvidia researchers [PDF] rests on the idea of “style transfer”. First, the generator network learns a constant input taken from a photograph of a real person. This face is used as a reference, and encoded as a vector that is mapped to a latent space that describe all the features in the image.
These features correlate to the essential characteristics that make up a face: eyes, nose, mouth, hair, pose, face shape, etc. After the generator learns these features it can begin adjusting these details to create a new face.
The transformation that determines how the appearance of these features change is determined from another secondary photo. In other words, the original photo copies the style of another photo so the end result is a sort of mishmash between both images. Finally, an element of noise is also added to generate random details, such as the exact placement of hairs, stubble, freckles, or skin pores, to make the images appear more realistic.
“Our generator thinks of an image as a collection of 'styles' where each style controls the effects at a particular scale,” the researchers explained. The different features can be broken down into various styles: Coarse styles include the pose, hair, face shape; Middle styles are made up of facial features; and Fine styles determines the overall colour.
How the different style types are learned and transferred by crossing a photo with a source photo. Image credit: Kerras et al. and Nvidia.
The different style types can, therefore, be crossed continuously with other photos to generate a range of completely new images to cover pictures of people of different ethnicities, genders and ages. You can watch a video demonstration of this happening below.
The discriminator network inspects the images coming from the generator and tries to work out if they’re real or fake. The generator improves over time so that its outputs consistently trick the discriminator.
Great, more bots on the internet
It’s getting harder to tell if something is real or machine-made these days. Developing these tools allows researchers to explore and easily test GANs, but there could be potential downsides.
Remember when perverts used GANs to make tweak porno videos by adding the face of their favourite celebrities over the bodies of adult actresses when the faceswapping code was published on GitHub?
Well, Nvidia is also planning to publish its source code and datasets soon, so other people can have a crack at creating their own fake faces. It means that people might be able to do things like creating realistic looking profile pictures for fake accounts like bots on Twitter or Facebook.
Fake prudes: Catholic uni AI bot taught to daub bikinis on naked chicksREAD MORE
Although Nvidia’s results may be the best so far, there are still tiny discrepancies if you stare long enough. Kyle McDonald, an artist working with code, has published a list of things to look out for when trying to recognise if an image was generated by an AI system or not.
“At low resolutions, almost all the images in the paper are indistinguishable from photographs. There are only a few artifacts that stand out to me that I will try to address,” he said in a blog post earlier this month.
McDonald points out what he calls “the missing earring” problem. Often in the images, there is small circular glitch below the ears that could come from the GAN attempting to add in earrings that were previously seen in photos. There are other slight oddities too, like the asymmetry of facial features, lack of detailing in teeth, strange hair strands, and patterns in clothing.
It’s not clear how easy it is to replicate Nvidia’s results, however. A spokesperson told El Reg that the paper was currently under peer review, and the submission rules don’t allow any preliminary discussions with the press until the paper has been published.
According to the paper [PDF], however, there are 26.2 million parameters that can be tweaked during the training process, so it’s probably not a project to take on if you haven’t got the money for the hardware and compute. ®