Mining social networks for every scrap of information about our online lives is now common practice for marketers, academics, government agencies, and so on.
Text in tweets, blogs and other posts is valuable because it's searchable, analyzable, and not terribly costly to crawl, fetch or store. But ongoing computer vision advancements have opened up the wealth of information encoded in images.
Earlier this week, researchers from University of California, Los Angeles described a way to analyze images to find protesters, to characterize their activities and to assess the level of violence depicted.
In a paper titled "Protest Activity Detection and Perceived Violence Estimation from Social Media Images," graduate student Donghyeon Won, assistant professor of public policy Zachary C Steinert-Threlkeld, and assistant professor of communication studies Jungseock Joo explore how imagery can be used to understand protests, because text may not be reliable.
"The important feature of our method is objectivity," said Joo in an email to The Register. "Text can be made up easily. There are lots of fake accounts too. Some people may say a protest is violent; others may say peaceful. But when you actually see a photograph with people shot and bleeding, you know it is violent."
What's novel here, said Joo, is activity recognition from images rather than text. "At the moment, no one else is capable of analyzing visual content in social media to characterize social movements," he said. "I am not aware of any other work using images."
The researchers collected, via keyword search and Twitter data stream, 10,000 images likely to be related to protests. They then trained a classification convolutional neural network to build a set of likely protest images and a set of negative examples, and passed the pictures for annotation by Amazon Turk workers.
The Turkers were directed to identify whether an image contained protest activity or protesters, to identify visual attributes in the scene, and to estimate the perceived level of violence and emotional sentiment.
Using models built from this data and other software, like OpenFace for face recognition and dlib for machine learning, the UCLA boffins designed a protest recognition system. They claim it performs very well for identifying violence in images and less well for identifying emotion, which they suggest may be attributable to the inconsistency of the training data generated by Amazon Mechanical Turk workers.
Applying their system to five protest events – the Women's March, Black Lives Matter, and protests in South Korea, Hong Kong, and Venezuela – the researchers contend that "protests in Venezuela are more violent and angrier than the other protests" and that "the Women's March is the least violent and angry" of all the protests studies.
They also note that there were more women at the Women's March, more African Americans at Black Lives Matter, and that the South Korean protests were the most well-organized, based on the greater presence of large groups in images.
While such conclusions may be intuitive, they're less easily challenged as opinion when derived from accepted algorithms. "Our paper will enable fair and objective reporting of protest events," Joo explained.
The researchers' model isn't perfect. They note that it failed to understand the meaning of symbolic acts like "die-ins" at the Black Lives Matter protest. That is to say, it may treat people pretending to be casualties as actual casualties, which could sway the computed level of violence. But refinement and iteration should be expected.
Who watches the social media watchers
In an email to The Register, Jeff Bigham, associate professor at Carnegie Mellon's Human-Computer Interaction Institute in Pittsburgh, Pennsylvania, said scientists have employed computer vision for activity tracking for a long time.
Such work, he said, invites questions about how it will be used.
"With growing abilities of computer vision and our political climate, many people have begun to worry about how computer vision could be used at scale by, for instance, an authoritarian regime to detect political dissidents," he said. "There was a paper out last week that claimed some accuracy in identifying protesters whose faces are partially obscured."
Without addressing the details of the UCLA research, Bigham said data generated through Amazon Mechanical Turk can present problems. "There's a lot of subtlety in whether what Turkers labeled as violent really was, and whether it's what authoritarians would target," he said.
Steinert-Threlkeld, in an email to The Register, said he and his colleagues approached this project without a specific end user in mind. "You can imagine why governments may be interested in detecting these activities, violent or not," he said. "On the other side, protest organizers could monitor photos to see whether or not a protest is becoming violent."
He suggested protest organizers might use this sort of technology to watch for emerging violence in order to defuse tensions, because violent movements tend to receive less support than non-violent ones.
Asked whether he worried that the development of capable image and text monitoring technology would have a chilling effect on social media participation, he said he was not concerned.
"The data-gathering and analytic capabilities of governments and major businesses far exceeds what we have for this project," said Steinert-Threlkeld. "They already use text to detect protests and predict changes, and I expect they are using images already." ®