Academic publishers turn to AI software to catch bad scientists doctoring data
High level of image duplication in a single paper is a sign of cheating
Analysis Shady scientists trying to publish bad research may want to think twice as academic publishers are increasingly using AI software to automatically spot signs of data tampering.
Duplications of images, where the same picture of a cluster of cells, for example, is copied, flipped, rotated, shifted, or cropped is, unfortunately, quite common. In cases where the errors aren't accidental, the doctored images are created to look as if the researchers have more data and conducted more experiments then they really did.
Image duplication was the top reason papers were retracted for the American Association for Cancer Research (AACR) over 2016 to 2020, according to Daniel Evanko, the company's Director of Journal Operations and Systems. Having to retract a paper damages the authors and the publishers' reputation. It shows that the quality of work from the researchers was poor, and the editor's peer review process missed mistakes.
To prevent embarrassment for both parties, academic publishers like AACR have turned to AI software to detect image duplication before a paper is published in a journal. The AACR started trialling Proofig, an image-checking programme developed by a startup going by the same name as their product based in Israel. Evanko presented results from the pilot study to show how Proofig impacted AACR's operations at the International Congress on Peer Review and Scientific Publication conference held in Chicago this week.
AACR publishes ten research journals and reviews over 13,000 submissions every year. From January 2021 to May 2022, officials used Proofig to screen 1,367 manuscripts that had been provisionally accepted for publication and contacted authors in 208 cases after reviewing image duplicates flagged by the software. In most cases, the duplication is a sloppy error that can be fixed easily. Scientists may have accidentally got their results mixed up and the issue is often resolved by resubmitting new data.
On rare occasions, however, the dodgy images highlighted by the software are a sign of foul play. Four papers out of the 208 were withdrawn, and one was rejected afterwards. Academic fraud is uncommon, and is often associated with paper mills or less reputable institutions. Cases of cheating, however, have been and continue to be uncovered at top labs from prestigious universities. A recent investigation revealed by Science reported that decades of Alzheimer's research, which led to fruitless searches of new treatments and failed clinical trials, are based on a highly-cited paper plagued with image duplication.
The results in question are a series of blurry lines produced using a technique known as Western blots, have allegedly been copied, edited, and pasted in mice data. The duplicitous effects are very difficult to spot for the untrained eye. Looking for such subtle changes is a tedious task for most humans, but one well-suited for computers, Proofig's co-founder Dror Kolodkin-Gal told The Register.
Proofig's job is to first detect every image in an uploaded paper that is relevant for analysis. The software ignores pictures of bar charts or line graphs. Proofig then has to check whether one particular image matches against all the other subimages in the paper. The submages may be shifted, flipped, or rotated; parts may be cropped, copied, or repeated. "There are so many possibilities," Kolodkin-Gal said.
Proofig uses a mixture of computer vision and AI algorithms to extract and classify images. The software is computationally complex, and would not have been possible without the recent progress in machine learning, Kolodkin-Gal thought. "Before AI, just to extract the subimages from a paper would have required ten times more investment in R&D, and god knows how to do the computing. I think the improvement in technology both in algorithms and the ability to run GPUs in the cloud is what has changed things," he said.
Humans in the loop required
AI software like Proofig can't catch out cheaters on its own. "You still need a human with some knowledge and expertise to interpret the results," Elisabeth Bik, an image forensics expert and independent science consultant, told The Register. "You cannot just let software automatically run its course, it might flag a lot of things that are perfectly fine." In some cases, the human eye can outperform computers.
YouTube thinkfluencer admits he plagiarized papers – as ESA axes his workshopREAD MORE
Bik uses a different AI-based software called ImageTwin for her work. Sometimes it struggles with analysing Western blots. "A Western blot is basically just a black stripe on a plain background. There are little subtleties in the shape that I see as a human but that the software somehow cannot see, and I think it just has to do with the way our eyes and brains are super complex. I think the software just looks at relative distances and so a black stripe always looks like a black stripe, and it's not very good at finding little edges or the shape of a block that is similar to another shape," she said.
Kolodkin-Gal agreed that Western blots are particularly challenging for machines to inspect. "It took us a lot of investment to finally find a good algorithm in order to find those Western bands. It's very, very challenging to AI because they are very small," he said.
Academic publishers use image-checking tools like Proofig at different stages of the publishing process. AACR scanned manuscripts that had been tentatively approved, and others like Taylor & Francis will use it only to check papers, where concerns have been raised by editors or peer reviewers. "If the software detects potential image duplication or manipulation, and that is supported by our specialist team, we will begin an investigation following our established procedures and the guidelines set out by the Committee on Publication Ethics for such cases," a spokesperson representing the company told us.
It has flagged issues in nearly a third of the papers we have run through the software which then require further subject matter expertise to understand
The decision when and where to use these tools in the publishing pipeline is a matter of cost. Image processing is computationally intensive, and publishers must cover cloud computing costs for a startup like Proofig.
Screening every paper at the submission stage would be too expensive. Analysing 120 subimages with Proofig, for example, will cost an individual $99. It isn't cheap considering all the possible number of combinations Proofig will have to process in a single paper.
Organisations like AACR or Taylor & Francis will have negotiated a specific package at cheaper rates tailored to their individual operations.
"Because of the manual oversight and the cost associated with its use, we currently use Proofig on relevant manuscripts when they are at a more advanced stage of review rather than on initial submission," Helen King, Head of Transformation & Product Innovation, at SAGE Publishing told us. "To date, it has flagged issues in nearly a third of the papers we have run through the software which then require further subject matter expertise to understand and interpret the results."
AI can't detect plagiarised images across different papers yet
The American Society for Clinical Investigation has also adopted Proofig, whilst other publishers like Frontiers have built their own tools. Wiley is using some kind of software too, whilst PLOS, Elsevier, and Nature are either open to or actively testing programmes, Nature first reported.
Although AI software is getting better at spotting dodgy data, it cannot catch all the different ways scientists can cheat. Proofig can check whether images appear duplicated within the same paper, but can't yet look for copies across different papers. It can't catch cases where images might have been plagiarised across different papers yet. The company would need to build a database of image caches scraped from published papers for comparison.
"The main challenge today for the community is big data," Kolodkin-Gal said. "If the publishers will not start working together to build a database of problematic images, [image plagiarism] will remain a problem. To develop AI, you must have big data."
Still, software like Proofig is a good start to clamping down on cheating and improving scientific integrity. "I do think it's a good development that publishers are starting to use software because it provides a bit of quality control to the publishing process," Bik said. "It will work as a deterrent. It will tell authors we're going to go to screen your paper for these types of duplications. I think it will not prevent cheating, but it will make it a little bit harder to cheat." ®