This article is more than 1 year old
1.4bn records from HaveIBeenPwned offered for your analytical pleasure
Troy Hunt's Christmas trove is a splendid gift for security and data nerds
Security researcher Troy Hunt had better hope his anonymisation works: he's decided to offer up most of his “HaveIBeenPwned” data set for other security researchers to analyse.
He's deduped his nearly two-billion record dataset – there's a lot of pwnage in the world, people – down to a domain-based 135-megabyte text file that records which domains have suffered dumps, and how many records were compromised in a breach.
There are also more complex records – thoroughly de-identified – that show overlaps:
As Hunt writes, “There are 20 people that have been pwned in that unique combination of five websites.”
Even thinned out, the research dataset Hunt's now publishing amounts to nearly 2.4 million rows, enough to keep any big data scientist, nerd or tinkerer humming happily in the corner, extracting insights while the rest of the family is enjoying the traditional Christmas argument.
“For each row, you can then take the breach names and reconcile them to the list of breaches exposed in the API. What this means is that you can access all the other attributes of the incident”, he adds.
So Hunt doesn't get DDoS-ed, there's a Torrent, and he asks anyone grabbing the file to leave it seeding for a while to save some of his bandwidth. ®