This article is more than 1 year old

Don't give it away, give it away, give it away now, bot busting biz tells reCAPTCHA data serfs

Instead of enriching Google, try making a market for click work

Analysis Internet companies depend on free labor. Companies like Amazon, Facebook and Google rely upon content creators who give their work away for the sake of platform participation or perhaps naive altruism.

A startup called Intuition Machines believes there's a better way, one that involves machine learning (software) and a blockchain (of course).

Google pioneered the art of harnessing latent labor online with its PageRank algorithm, which captures the work that goes into linking to favored websites and uses it to improve the relevance of its search results.

A decade ago, the Chocolate Factory acquired reCAPTCHA from computer scientists at Carnegie Mellon University and began turning the clicks of people trying to prove they're not bots into data that improves text digitization, image annotation and machine learning projects. Everyone benefits, but none more than Google.

Intuition Machines contends that the value of what's been euphemistically called mass collaboration, which it estimates to be 100 person-years of crowdsourced labor every day, could be better allocated through an auction-based system called hCaptcha, released earlier this year.

Internet users shouldn't get too excited – they won't be able to monetize their nearly worthless labor. But websites could turn these collected pennies into a bit of revenue with enough users solving CAPTCHA (completely automated public Turing test to tell computers and humans apart) puzzles. And companies in need of efficient data labeling could have access to a more efficient market to tackle such tasks.

The hCaptcha team estimates that the cost to break reCAPTCHA v3 puzzles via hacking services is about $1 per 1,000 solves or less, or $0.001 or less per answer. And it puts the cost of labeling an image – using Amazon Mechanical Turk, for example – significantly higher, at $0.03 to $1 or more per image.

The difference in those two costs translates into billions of dollars in value collected by Google over the years via reCAPTCHA clicks, at least by the calculations of the hCaptcha team.

Intuition Machines claims it can help web publishers share in that bounty by auctioning click labor to the highest bidder. The company says large publishers that serve a lot of hCaptcha puzzles can generate a thousand dollars a month or more in Ethereum tokens.

"Behind hCaptcha lies the HUMAN Protocol, an open decentralized protocol for human labor that runs on the Ethereum blockchain," the company explained when it announced its beta test last year.

"This has many advantages: allowing 'open books' to prove we’re fairly distributing bounties, efficient micro-payments via Human Tokens (an EIP20-compatible token with a custom Bulk API), providing a novel mechanism to scale a two-sided market in a capital-efficient way, and more."

Winning bidders get to present website visitors with hCaptcha puzzles for tasks that benefit the bidder's data-gathering goals, such as object recognition, attribute detection, relevance ranking, boundary detection and identifying text in images. And even users get something out of the exchange in the form of better human-bot disambiguation, or so the company suggests.

Carrots and sticks

Google, according to Intuition Machines, has a disincentive to make its bot detection really good because doing so would reduce its ad revenue.

"If Google officially determines that a user seeing an ad or clicking a link was in fact a bot, it cannot charge for ads shown to that user," the company explains in a blog post on Wednesday. "This conflict of interest has severely limited the scope of Google's anti-bot ambitions."

Robot hands holding reCAPTCHA image

Google's reCAPTCHA favors – you guessed it – Google: Duh, only a bot would refuse to sign into the Chocolate Factory

READ MORE

The firm claims that Google hasn't developed a retroactive bot detection system – which could comb through log files to spot ad fraud and issue refunds – and says that's a sign that the ad biz isn't interested in making reCAPTCHA the best that it can be.

"Offering retroactive bot identification would open Google up to thorny questions of how to retroactively refund advertisers who spent money on that fraudulent traffic," the company says. "The reCAPTCHA product has thus stagnated for a decade."

To support that claim, the company notes that the cost charged by services solving reCAPTCHA challenges has not changed since 2016. So whenever improvements Google has made since then have not made its puzzles harder to crack, at least from a monetary standpoint.

The Register asked the company to provide more specific data about how hCaptcha and reCAPTCHA perform. We're told there are terms of service limitations that make this difficult so the company is waiting for a third-party to provide these numbers.

Along similar lines, clients have yet to give Intuition Machines clearance to talk about their use of hCaptcha. As we understand the situation, companies don't want to be seen relying on external vendors to improve internal machine learning competency.

What's more, CAPTCHA system comparisons, we understand, can be tricky. Simply switching from reCAPTCHA to hCaptcha can lead to a sharp reduction in bots creating fake accounts but that's not necessarily due to superior technology. It may be because the bot scripts hitting the site have been tuned to attack reCAPTCHA. Specific adjustments to target hCaptcha might reduce its bot bounce rate.

The Register asked Google for comment but we've not heard back. ®

More about

TIP US OFF

Send us news


Other stories you might like