Video US-based academics claim they have developed a machine-learning system that can beat Google's bot-detecting reCAPTCHA system.
Designed to stop stuff like automated scripts from doing things like creating accounts or purchasing tickets online en masse, reCAPTCHA v2 presents an image, or a series of images, and asks netizens to click on the portions that contain a specific object, like a car or traffic light. This is supposed to defeat bots as they should, ideally, fail these simple challenges, whereas people should be able to breeze through.
The team from the University of Louisiana at Lafayette, though, reckon their ImageBreaker [PDF] application is able to pass reCAPTCHA v2's online image-recognition tests with 92.4 per cent success, and an average of 14.86 seconds per are-you-a-bot challenge. When the system is run offline against image challenges, it can crack the visual riddles at an even faster rate, 5.27 seconds, with 95 per cent accuracy, we're told.
Below is a video of the thing in action:
"It shows that the implementation of reCaptcha v2 does not conform to its motto well because we find that our bot can also solve the captchas with dynamic images even better than human labors," the team explained in a paper, shared with The Register late last month.
This is not the first time eggheads have shown how the widely used Google are-you-a-human system can be bypassed. Earlier this year, boffins at the University of Maryland showed how the audio version of the anti-bot tool could be thwarted, and previous efforts have demonstrated that the image-based filter could be defeated with deep-learning software.
The Louisiana team, however, we're told, took things a step further by performing the entire attack online and on-the-fly, rather than downloading the images and solving the challenge offline. This is effectively the difference between theorizing that the reCAPTCHA v2 system can be beaten using AI, and actively demonstrating how the verification filter can be beaten in the wild. This means websites using version two of Google's technology could be swarmed by bots, if this academic study works as claimed and is weaponized.
To beat the system, the team built three different attack modules that each carry out a different task. The first module gets the image itself as well as the challenge type – such as, what object needs to be clicked on – and what the layout of the grid is.
A second module performs the task of actually identifying objects within the image. The module, which uses machine-learning code, spits out a JSON array with the object name, a confidence score, and the grid numbers to click in order to solve the puzzle. Finally, the third module performs the task of actually submitting the answers, checking if the challenge was successfully completed, and stopping if so.
The academics, who say that they have reported their findings to Google, claim their system makes use of a fundamental design flaw in reCAPTCHA v2 that makes it easier for bots to solve the image puzzles.
"We argue that the essential flaw is the design of reCaptcha v2 changes the normal object recognition problem to an object category verification problem. It reduces a hard problem to an easier problem," they wrote. "For example, it gives the object category and asks the bot to check whether the grids contain that object. The design reduces the difficulty level of the challenge for a bot to solve."
The team hopes that their work can be put to use by Google to strengthen reCAPTCHA v2 against automation.
"Once such a critical security mechanism is broken, bots can gain access to services they are not allowed," the team explained. "For this reason, it is crucial to keep captchas secure and reliable."
The paper, titled Bots Work Better than Human Beings: An Online System to Break Google’s Image-based reCaptcha v2, was written by Uni of Louisiana at Lafayette grad students Imran Hossen, Yazhou Tu, Fazle Rabby, and Nazmul Islam, along with assistant professor Xiali Hei, and China-based Jiaotong University professor Hui Cao. Their work is, to the best of our knowledge, not yet published in a journal.
A spokesperson for Google was not available for immediate comment. For what it's worth, there is a version three of reCaptcha, which is designed to identify and stop bots based on their activity, rather than challenging them to puzzles, although some boffins claim they can bypass [PDF] that filtering. ®