Scientists tricked into believing fake abstracts written by ChatGPT were real
Study warns tool could be used to create fake research papers for paper mills
Academics can be fooled into believing bogus scientific abstracts generated by ChatGPT are from real medical papers published in top research journals, according to the latest research.
A team of researchers led by Northwestern University used the text-generation tool, developed by OpenAI, to produce 50 abstracts based on the title of a real scientific paper in the style of five different medical journals.
Four academics were enlisted to take part in a test, and were split into two groups of two. An electronic coin flip was used to decide whether a real or fake AI-generated abstract was given to one reviewer in each group. If one researcher was given a real abstract, the second would be given a fake one, and vice versa. Each person reviewed 25 scientific abstracts.
Reviewers were able to detect 68 per cent of fake abstracts generated by AI and 86 per cent of original abstracts from real papers. In other words, they were successfully tricked into thinking 32 per cent of the AI-written abstracts were real, and 14 per cent of the real abstracts were fake.
Catherine Gao, first author of the study and a physician and scientist specialising in pulmonology at Northwestern University, said it shows ChatGPT can be pretty convincing. "Our reviewers knew that some of the abstracts they were being given were fake, so they were very suspicious," she said in a statement.
"The fact that our reviewers still missed the AI-generated ones 32 [per cent] of the time means these abstracts are really good. I suspect that if someone just came across one of these generated abstracts, they wouldn't necessarily be able to identify it as being written by AI."
- OpenAI is developing software to detect text generated by ChatGPT
- University students recruit AI to write essays for them. Now what?
- AI programming assistants mean rethinking computer science education
- GPT-3 'prompt injection' attack causes bad bot manners
Large language models like ChatGPT are trained on large amounts of text scraped from the internet. They learn to generate text by predicting what words are more likely to occur in a given sentence, and can write grammatically accurate syntax. It isn't surprising that even academics can be fooled into believing AI-generated abstracts are real. Large language models are good at producing text with clear structure and patterns. Scientific abstracts often follow similar formats, and can be quite vague.
"Our reviewers commented that it was surprisingly difficult to differentiate between the real and fake abstracts," Gao said. "The ChatGPT-generated abstracts were very convincing…it even knows how large the patient cohort should be when it invents numbers." A fake abstract about hypertension, for example, described a study with tens of thousands of participants, whilst one on monkeypox included a smaller number of patients.
It was surprisingly difficult to differentiate between the real and fake abstracts
Gao believes tools like ChatGPT will make it easier for paper mills, who profit from publishing studies, to churn out fake scientific papers. "If other people try to build their science off these incorrect studies, that can be really dangerous," she added.
There are advantages to using these tools too, however. Alexander Pearson, co-author of the study and an associate professor of medicine at the University of Chicago, said they could help non-native English scientists write better and share their work.
AI is better at detecting machine text than humans. The free GPT-2 Output Detector, for example, was able to guess with over 50 per cent confidence that 33 out of 50 papers were indeed generated by a language model. The researchers believe paper submissions should be run through these detectors, and that scientists should be transparent about using these tools.
"We did not use ChatGPT in the writing of our own abstract or manuscript, since the boundaries of whether this is considered acceptable by the academic community are still unclear. For example, the International Conference on Machine Learning has instituted a policy prohibiting its use, though they acknowledge that the discussion continues to evolve and also clarified that it is okay for it to be used in 'editing or polishing'," Gao told The Register.
"There have been groups who have started using it to help writing, though, and some have included it as a listed co-author. I think that it may be okay to use ChatGPT for writing help, but when this is done, it is important to include a clear disclosure that ChatGPT helped write sections of a manuscript. Depending on what the scientific community consensus ends up being, we may or may not use LLMs to help write papers in the future." ®