Researchers who heavily rely on social media data when studying human behaviour have been warned that such information can be very easily skewed.
Computer scientists at McGill University in Montreal and Carnegie Mellon University in Pittsburgh said in a paper published yesterday in the Science magazine that academics were failing to spot the flaws in the data.
And yet, in recent years, there has been an explosion of studies on human behaviour using social media as a barometer for all kinds of predictions about the world we live in now.
"Many of these papers are used to inform and justify decisions and investments among the public and in industry and government," said McGill's assistant computer science professor Derek Ruths.
He added: "The common thread in all these issues is the need for researchers to be more acutely aware of what they're actually analysing when working with social media data."
The boffins offered up a list of "challenges" faced by researchers who glean their statistics from social media data.
- Different social media platforms attract different users – Pinterest, for example, is dominated by females aged 25-34 – yet researchers rarely correct for the distorted picture these populations can produce.
- Publicly available data feeds used in social media research don't always provide an accurate representation of the platform's overall data – and researchers are generally in the dark about when and how social media providers filter their data streams.
- The design of social media platforms can dictate how users behave and, therefore, what behaviour can be measured. For instance, on Facebook the absence of a "dislike" button makes negative responses to content harder to detect than positive "likes".
- Large numbers of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behaviour.
- Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are. For instance, efforts to infer political orientation of Twitter users achieve barely 65 per cent accuracy for typical users – even though studies (focusing on politically active users) have claimed 90 per cent accuracy.
Despite the blindingly obvious weaknesses found in such data, Ruths remained optimistic about researchers using social media in their studies, if they tackle the problems outlined by the prof and his colleagues.
The Social Media for Large Studies of Behaviour paper can be viewed here. ®