40,000 Tinder pics scraped into big data service

Trove then disappears, as folks point out the privacy problem

27 Reg comments Got Tips?

Amid a storm of criticism, a set of facial images built by scraping the Tinder dating service has been pulled from Kaggle.

Developer Stuart Colianni had built the 40,000-strong set of “hoes” (the charming variable name* in his source code – more below in case that repo also dies) on the premise that facial datasets are generally too small to be useful.

The Kaggle page where he published the dataset now returns a 404.

The Register has asked Kaggle, whose terms and conditions forbids crawlers, to confirm the reason for the deletion.

At the GitHub page, Colianni attributes the removal to a request from Tinder.

In any jurisdiction with medium-strength privacy regulations, scraping and publishing the data without consent probably represents a breach.

For example Australian privacy analyst Stephen Wilson of Lockstep told The Register scraping a dating site is “an offence akin to theft by finding” (that is, if you find a suitcase stuffed with banknotes, you're don't get to keep it, you have to try and find the owner).

Likewise, the popular hobby of inferring personally identifiable information from multiple datasets is a breach of privacy legislation in many countries.

Wilson notes that the word “public” almost never occurs in data privacy laws around the world. ®

*Bootnote: It's hard to accept the intentions as benign with code snippets like this:

# Iterate through list of subjects
        for hoe in hoes:
                # Get the subject ID
                sid = hoe['_id']
                # Gets a list of pictures of the subject
                pictures = hoe['photos']
Screen grab from GitHub

Keep it classy

We're all hoes, it seems.


Biting the hand that feeds IT © 1998–2020