This article is more than 1 year old
Arrogant, subtle, entitled: 'Toxic' open source GitHub discussions examined
Developer interactions sometimes contain their own kind of poison
Analysis Toxic discussions on open-source GitHub projects tend to involve entitlement, subtle insults, and arrogance, according to an academic study. That contrasts with the toxic behavior – typically bad language, hate speech, and harassment – found on other corners of the web.
Whether that seems obvious or not, it's an interesting point to consider because, for one thing, it means technical and non-technical methods to detect and curb toxic behavior on one part of the internet may not therefore work well on GitHub, and if you're involved in communities on the code-hosting giant, you may find this research useful in combating trolls and unacceptable conduct.
It may also mean systems intended to automatically detect and report toxicity in open-source projects, or at least ones on GitHub, may need to be developed specifically for that task due to their unique nature.
Computer scientists at Carnegie Mellon University and Wesleyan University in the US recently conducted a study of online toxicity to understand how it manifests in open source communities.
The researchers – Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian Kästner – describe their findings in a paper [PDF] titled, "'Did You Miss My Comment or What?' Understanding Toxicity in Open Source Discussions," that was presented last month at the ACM/IEEE International Conference on Software Engineering in Pittsburgh, Pennsylvania.
In a video explainer, Miller, a doctoral student at CMU's Institute for Software Research and lead author on the paper, says the project adopted the definition of toxicity proposed by those working on Google's Perspective API: "rude, disrespectful, or unreasonable language that is likely to make someone leave a discussion."
"Virtually all online platforms recognize the threat that toxicity, or the various types of behavior under its umbrella, poses on the health and safety of online communities," the paper says. "As a result, a number of prevention and mitigation policies and interventions have been proposed, including codes of conduct, moderation, counterspeech, shadow banning, or just-in-time guidance to authors."
Toxicity in open source is often written off as a naturally occurring if not necessary facet of open source culture
With regard to open-source projects, the paper cites the Linux Kernel Mailing List as having been notorious for unwelcoming interaction. Linux creator Linus Torvalds acknowledged as much at The Linux Foundation's recent Open Source Summit. He said he has been at times "overly impolite" and apologized for what he described as "a personal failing."
The open source community's long tradition of blunt interaction has led many projects to adopt codes of conduct, the paper notes. The reason for doing so is to encourage contributors to join open source projects and to keep them from being driven away by trolling and other forms of hostility.
The researchers acknowledge that "toxicity in open source is often written off as a naturally occurring if not necessary facet of open source culture." And while there are those who defend a more rough-and-tumble mode of online interaction, there are consequences for angry interactions. Witness the departures in the Perl community over hostility.
"Toxicity is different in open-source communities," Miller said in a CMU news release. "It is more contextual, entitled, subtle and passive-aggressive."
The Register asked Miller via email whether "toxic" as a term leaves room for individuals with different levels of tolerance for "toxic" behavior, given that labeling something "toxic" forecloses discussion about whether it's appropriate.
"When manually confirming that the comments were toxic, we considered a comment toxic if it could have made anyone want to leave the discussion, including a newcomer," Miller told us. "This was intentionally done because traditionally speaking in many open source projects, toxicity is often dismissed as a naturally occurring and necessary aspect of the culture.
"However, many open source contributors have cited toxic and continuously negative behavior as their reason for disengaging (see Section 2 of our paper for more details). Because of this, it was important to consider toxicity that could be considered toxic to a wide spectrum of open source contributors."
Toxicity in open source projects is relatively rare – the researchers in previous work found only about six per 1,000 GitHub issues to be toxic. That meant a random sampling of issues wouldn't serve the research objective, so the group adopted several strategies for identifying toxic issues and comments: a language-based detector, finding mentions of "codes of conduct" and locked threads, and threads that had been deleted.
The result was a data set of 100 toxic issues on GitHub. What the researchers found was that toxicity on the Microsoft-owned website has its own particular characteristics.
An unpleasant subtle flavor
"Unlike some other platforms where the most frequent types of toxicity are hate speech or harassment, we find entitlement, insults, and arrogance are among the most common types of toxicity in open source," the paper explains.
The computer scientists note that GitHub Issues, while they include insults, arrogance, and trolling seen elsewhere, do not exhibit the severe language common on platforms like Reddit and Twitter. Beyond milder language, GitHub differs in its abundance of entitled comments – people making demands as if their expectations were based on a contract or payment.
"In contrast to arrogant, insulting, and trolling comments, entitled comments seem to be a phenomenon more specific to open source and the dynamics of free-to-use software, with seemingly free support despite no contractual obligations," the paper says.
The milder nature of GitHub toxicity appears to be related to the fact that most of the toxic interaction comes not from trolls or anonymous users, but from experienced open source developers and project maintainers.
- AI chatbot trained on posts from web sewer 4chan behaved badly – just like human members
- Machine-learning models more powerful, toxic than ever
- Always read the comments: Beijing requires oversight of all reader-generated chat
- Leave that sentient AI alone a mo and fix those racist chatbots first
What does GitHub toxicity look like? Here's one of the discussions, from the Elementary OS repo two years ago, cited by Miller, who points to passages like, "The problem is your team forcing us to use the OS the way you want us to use it although it makes it 1,000,000 times harder to use it your way, than what would be convenient for us."
"These quotes help encapsulate the entitled, demanding, and often insulting nature of the toxic comments frequently found in the open source communities we observed," Miller said.
Danielle Foré, founder of Elementary, closed the issue thread, explaining, "I'd like to be really clear that our policy is not to lock threads where there are dissenting opinions. Respectful debate is healthy and important. But when you start calling names and being disrespectful and disregarding the code of conduct, the discussion becomes unproductive. This kind of destructive behavior is not tolerated in our community."
The researchers identify a variety of triggers for toxic behavior, which mostly occur in large, popular projects. These include: trouble using software, technical disagreements, politics/ideology, and past interactions.
What's the harm?
The paper explores the nature of toxicity on GitHub but does not examine when real harm arises from it.
"The harms of toxicity were outside the scope of this project, but informally we observed that one thing that seemed to be an efficient way of curbing toxicity was for maintainers to cite their project's code of conduct and lock the thread as too heated," said Miller. "This seemed to help reduce the amount of time and emotional labor involved with dealing with the toxicity."
Asked about the findings, Martin Woodward, senior director of developer relations at GitHub, said in an emailed statement, "While it was reassuring to see that some of the more extreme toxic behaviors of online communities are less common on GitHub, the issues raised around people feeling entitled to put demands on the volunteer community leaders who run most open source projects is definitely something we recognize from our conversations with maintainers.
"We continue to try to educate people on GitHub with events like Maintainer Month, but we also continue to work with maintainers and researchers to improve the community moderation capabilities we provide."
Miller said that while she and her colleagues have not spoken directly with GitHub about their findings, they do have ideas for mitigating hostile interactions.
"One recommendation is that since open-source toxicity manifests differently than toxicity on other platforms, in order to effectively identify and intervene on it, future work should build an open-source specific toxicity detector," said Miller. ®