Boffins affiliated with dev tools biz JetBrains and HSE University in Moscow have devised an open-source plugin for the company's Java development editor that guards against copy-and-paste coding.
AntiCopyPaster, available on GitHub, works with IntelliJ IDEA, JetBrain's integrated development environment (IDE) for Java programmers. It was created by Anton Ivanov, Zarina Kurbatova, Yaroslav Golubev, Andrey Kirilenko, Timofey Bryksin to help mitigate the problems that can accompany copied code.
In a paper posted to ArXiv, the researchers observe that while "[c]opying and pasting constitute an essential part of writing programming code," doing so can lead to code maintenance, security problems, and licensing issues.
"While there is nothing wrong with the copying and pasting as such, research also shows that having clones inside a project can make its maintenance more difficult due to overgrown codebases," the paper explains. "Fixing vulnerabilities across multiple duplicate instances can be difficult and lead to increased security risks."
Software licensing problems are also a possibility. A 2020 paper by some of the same researchers looked at code cloning in 24,000 Java projects on GitHub and found that almost 10 per cent of copied code blocks potentially violate their original licenses.
- We've only gone and got our hands on an early preview of Fleet, the forthcoming JetBrains IDE
- All change at JetBrains: Remote development now, new IDE previewed
- Does the world need another cross-platform framework? Tough, here's JetBrains with Compose Multiplatform
There is a lot of duplicate code floating around due to developers' inclination toward copypasta. As of 2017, about 70 per cent of the code on GitHub came from copied files. The enduring attraction of copied code has given rise to a faux book titled "Copying and Pasting from Stack Overflow" and t-shirts bearing that same quip.
Additionally, cut'n'pasting the same blocks of source within a project is a sign of poor programming and application design; a classic so-called code smell that ought to be detected and tackled.
AntiCopyPaster attempts to deal with copypasta by monitoring the IDE for pasted code. It scans the Java methods within the destination file to find duplicates.
The plugin does so by trimming away spaces and checking to see whether each method's body contains the code snippet as a substring. If it doesn't find a match, it goes further by tokenizing the code and looking for substantial similarities.
But it also tries not to hector developers unnecessarily – the plugin tries not to intrude by waiting a user-set amount of time after a paste operation to allow the copied code to be edited. Only if the cloned code is left unaltered does the plugin then move on to checking whether the pasted fragment is Java code and whether it's correctly constructed.
If so, AntiCopyPaster will run the snippet through its onboard Gradient Boosting Classifier model to check whether it's a suitable candidate for refactoring (revision) using IntelliJ IDEA's built-in Extract Method. This involves removing a subset of statements from a method and creating new methods to be called in their place.
Developers who do so should end up with more manageable code and at least have a chance to catch potential problems in the original snippet.
The authors note that the AntiCopyPaster pipeline can be extended to look for other code imperfections.
"Overall, we hope that AntiCopyPaster can help developers maintain the quality of their projects by combating the propagation of code clones," the boffins' paper concludes. "We also hope that our research can inspire further work in the area." ®