ChatGPT is coming for your jobs – the terrible ones, at least
OpenAI tech outperforms digital serfs toiling away on Amazon's estate
Machine learning models can do content processing and data sanitation work better and more affordably than people participating in crowdsourcing platforms, according to a trio of researchers.
And that's not necessarily a bad thing for job seekers since some of the jobs likely to be affected seem pretty awful.
University of Zurich researchers Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli examined how OpenAI's large language model ChatGPT handled text annotation – adding labels to text to help machine learning models better understand – compared to crowdsourcing platform Amazon Mechanical Turk (MTurk).
The academics describe their findings in a paper whose title serves as a spoiler, "ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks."
Using a sample data set of 2,382 Twitter posts that had already been labeled by research assistants, the boffins compared how ChatGPT and MTurk workers fared with five distinct labeling tasks.
The work involved assessing how each tweet relates to the subject of content moderation in terms of relevance, stance, topics, and issue framing (e.g. whether content moderation is described as a problem that limits speech or a solution that protects against harmful speech).
"We find that for four out of five tasks, ChatGPT’s zero-shot accuracy is higher than that of MTurk," the paper stated. "...Moreover, ChatGPT is significantly cheaper than MTurk: the five classification tasks cost about $68 on ChatGPT (25,264 annotations) and $657 on MTurk (12,632 annotations)."
On a per-annotation basis, ChatGPT costs about $0.003, which is about twenty times cheaper than MTurk while being more accurate, the researchers say.
More accurate in this instance is not very accurate. Fabrizio Gilardi, a professor of policy analysis in the political science department of the University of Zurich and one of the paper's co-authors, told The Register in an email that ChatGPT's results were less than 50 percent accurate in some tasks, but that was still better than MTurkers.
- In the battle between Microsoft and Google, LLM is the weapon too deadly to use
- FTC urged to freeze OpenAI's 'biased, deceptive' GPT-4
- So you want to integrate OpenAI's bot. Here's how that worked for software security scanner Socket
- OpenAI rolls out ChatGPT plugins, granting iffy language model access to your apps
In all, the research results appear to be game over for human workers keen on securing this sort of work.
But Gilardi warned against reading the findings too broadly.
"It is too early to say how ChatGPT might replace crowd-workers," said Gilardi in a statement emailed to The Register. "Our paper demonstrates ChatGPT’s potential for data annotation tasks, but more research is needed to fully understand ChatGPT’s capacities in this area."
Gilardi said it's important to gather more data using different tasks, types of data, and languages, and added that MTurkers perform other jobs like survey research, image annotation, audio and video transcription, usability testing. And there may be scenarios in which human annotators can be more productive with the help of a model like ChatGPT, he suggested.
With that caveat, Gilardi said for the types of tasks studied, ChatGPT looks like it can replace crowdsourced workers, which is apt because certainly models like it are trained on crowdsourced human-annotated data. AI software taking over people's mundane work could have mental health benefits, given that human moderators have sued over the trauma of reviewing toxic content.
Tools such as ChatGPT could be a perfect candidate to replace or reduce human annotation for those tasks which involve an ethical consideration
"This has implications for unpleasant and harsh annotation tasks such as hate speech detection, which contains adverse psychological consequences for human annotators," Gilardi said. "In other words, tools such as ChatGPT could be a perfect candidate to replace or reduce human annotation for those tasks which involve ethical consideration for humans to do."
We've also lately held the opinion that ChatGPT and its ilk have one benefit: highlighting the annoying humdrum tasks we have to do each day, such as summarizing reports, emailing the boss, writing boilerplate code, completing homework, and so on. By offering to tackle boring, repetitive work, the models can absorb some of that tedium, when a better solution might be to rethink how we employ and make use of people effectively and efficiently in the first place.
A recent Goldman Sachs report [PDF] characterizes the adoption of generative AI as a productivity boost rather than a job destroyer, we note. It says, "extrapolating our estimates globally suggests that generative AI could expose the equivalent of 300 million full-time jobs to automation." ®