AI + ML

This article is more than 1 year old

Facebook uses one billion Instagram photos to build massive object-recognition AI that partly trained itself

Proof-of-concept SEER taught over eight days using 512 GPUs

Sat 6 Mar 2021 // 08:39 UTC

Facebook has trained its most advanced semi-supervised computer vision system yet on a dataset of a billion public images taken from Instagram, its other social network.

Known as SEER, short for SElf-supERvised, this massive convolutional neural network contains over a billion parameters. If you show it images of things, it will describe in words what it recognizes: a bicycle, a banana, a red-and-blue striped golfing umbrella, and so on. While its capabilities aren't all that novel, the way it was trained differs from the techniques used to teach other types of computer vision models. Essentially, SEER partly taught itself using an approach called self-supervision.

First, it learned how to group the Instagram pictures by their similarity without any supervision, using an algorithm nicknamed SwAV. The team then fine-tuned the model by teaching it to associate a million photos taken from the ImageNet dataset with their corresponding human-written labels. This stage was a traditional supervised method: humans curated the photos and labels, and this is passed on to the neural network that was pretrained by itself.

The software thus gains familiarity with a billion images from Instagram, learning how to group together similar pictures, and is then trained how to caption those pictures from a million ImageNet examples. That, to us, seems more efficient than accurately labeling a billion 'gram snaps to feed into a neural network.

Facebook has been big on unsupervised learning for years. Unlike with traditional supervised learning methods, developers don’t have to spend days or months carefully labeling data before it's spoon fed to their neural networks.

The latest trend in machine learning, however, especially when it comes to big models, is to go for something in between: semi-supervised learning. This involves pretraining a model on a large dataset using unsupervised learning before it's fine-tuned on another dataset using supervised learning. Other big models, such as OpenAI's text-generating GPT-3, are trained in this way too.

“We took advantage of a new algorithm called SwAV, which developed from FAIR research into self-supervised learning,” Facebookers Priya Goyal, Vittorio Caggiano, Piotr Bojanowski, and Armand Joulin explained this week, referring to Facebook AI Research, aka FAIR.

"SwAV uses online clustering to rapidly group images with similar visual concepts and leverage their similarities. With SwAV, we were able to improve over the previous state of the art in self-supervised learning — and did so with 6x less training time."

SEER thus learned to associate an image of, say, a red apple with the description "red apple." Once trained, the model's object-recognition skills were tested using 50,000 pictures from ImageNet it had not seen before: in each test it had to produce a set of predictions of what was pictured, ranked in confidence from high to low. Its top prediction in each test was accurate 84.2 per cent of time, we're told.

The model doesn't score as highly as its peers in ImageNet benchmarking. The downside of models like SEER is that they're less accurate than their supervised cousins. Yet there are advantages to training in a semi-supervised way, Goyal, first author of the project's paper on SEER, told The Register.

“Using self-supervision pretraining, we can learn on a more diverse set of images as we don’t require labels, data curation or any other metadata," she said. "This means that the model can learn about more visual concepts in the world in contrast to the supervised training where we can only train on limited or small datasets that are highly curated and don’t allow us to capture visual diversity of the world.”

Hundreds of Facebook moderators complain: AI content moderation isn't working and we're paying for it

Goyal believes that the technique will prove useful in areas including medical imaging where it’s difficult to amass large labelled datasets from private clinical data. “SEER’s performance demonstrates that self-supervised learning can excel at computer vision tasks in real-world settings. This is a major breakthrough that ultimately clears the path for more flexible, accurate, and adaptable computer vision models in the future,” the team reported.

SEER was trained over eight days using 512 GPUs. The code for the model isn’t publicly available, although VISSL, the PyTorch library that was used to build SEER, is now up on GitHub.

Facebook told us SEER remains a proof-of-concept idea and won’t be used to power any of the web giant's features or products for the moment. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

Facebook uses one billion Instagram photos to build massive object-recognition AI that partly trained itself

Proof-of-concept SEER taught over eight days using 512 GPUs

Hundreds of Facebook moderators complain: AI content moderation isn't working and we're paying for it

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Google Cloud chief is really psyched about this AI thing

AI spam is winning the battle against search engine quality

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

A different view from the edge

Arm flexes silicon muscles to push generative AI at the edge

Developers are calling the shots on AI planning, judging by your experience

EU tells Meta it can't paywall privacy

Psst, hey. It's the NSA. You want some AI security advice?

AI PCs are here but a killer application for biz users? Nope

Stability AI decimates staff just weeks after CEO's exit

Why making pretend people with AGI is a waste of energy

UK unions publish AI bill to protect workers from 'risks and harms' of tech

About Us

Our Websites

Your Privacy