A group of Google scientists working on extracting numbers from StreetView images has discovered that their technology can also match humans at solving captchas.
The aim, according to their research paper (at Arxiv, here), was to automatically extract accurate street number data from StreetView images so as to improve Google Maps location information.
Prior work, the researchers write, had worked on extracting individual numbers from an image, identifying each number, and then reassembling the whole street number. This, however, is inefficient, so the group let by Ian Goodfellow focussed on taking an entire image and identifying all the numbers in it.
Testing their model on Google's StreetView House Numbers dataset (which contains 200,000 numbers), the researchers found they were able to match human accuracy of 98 per cent with “95.64 per cent coverage”.
To achieve that accuracy, the researchers spent six days training Google's DistBelief neural network modelling framework. That training was then applied to all the house numbers Google holds – well into the tens of millions. The less constrained dataset reduced coverage down to 89 per cent while holding accuracy at the 98 per cent “equal to a human” threshold.
Getting numbers out of images is easy, says Google
The same model was then tested against Google's reCAPTCHA puzzle, achieving 99.8 per cent accuracy. The researchers write that while this doesn't render Captchas useless, “the utility of distorted text as a reverse turing test by itself is significantly diminished”.
As Google's Vinay Shet writes in this blog post, “the act of typing in the answer to a distorted image should not be the only factor when it comes to determining a human versus a machine”, and Google itself is reducing its “dependence on text distortions as the main differentiator between human and machine,” using Captchas instead to “perform advanced risk analysis”. ®