The anonymous CynoSure Prime “cracktivists” who two years ago reversed the hashes of 11 million leaked Ashley Madison passwords have done it again, this time untangling a stunning 320 million hashes dumped by Australian researcher Troy Hunt.
Hunt, of HaveIBeenPwned fame, released the passwords in the hope that people who persist in re-using passwords could be persuaded otherwise, by letting websites look up and reject common passwords. The challenge was accepted by the group of researchers who go by CynoSure Prime, along with German IT security PhD student @m33x and infoseccer Royce Williams (@tychotithonus).
The password databases Hunt mined for his release were sourced from various different leaks, so it's not surprising that many hashing algorithms (15 in all) appeared in it, but most of them used SHA-1. That algorithm was handed its death-note some time ago, and its replacement became untenable in February this year when boffins demonstrated a practical SHA-1 collision.
The other problem is its weakness: hashing is used to protect passwords because it is supposed to be irreversible: p455w0rd gets hashed to
b4341ce88a4943631b9573d9e0e5b28991de945d, the hash gets stored in the database, and it's supposed to be impossible to get the password from the hash.
The 15 different hashes in use were discovered using the MDXfind tool.
Along the way, the post looks at Hunt's methodology and notes that some people are storing info beyond just the passwords in the hashes (for example, there are email:password combinations and other varieties of personally identifiable information, which CyptoSure Prime says Hunt didn't intend to release).
Hunt told The Register the CynoSure Prime people did some “pretty neat” work, and that they've been cooperative.
He agreed that the data leaks involved carried “a bit of junk” because the original owners made mistakes in parsing, and as a result the leaked user lists include names where only passwords are expected.
While some of this landed in his release, Hunt said, those data sets are in “two files that anyone could download with a few minutes' searching”. He's working with the CynoSure Prime data to purge it from the hashed lists hosted at HaveIBeenPwned.
When it comes to reversing the hashes, the post illustrates just how good the available tools have become: running MDXfind and Hashcat on a quad-core Intel Core i7-6700K system, with four GeForce GTX 1080 GPUs and 64GB of memory, the researchers “recover all but 116 of the SHA-1 hashes”.
With the passwords reversed, here's the distribution of character sets found by CynoSure Prime
Most of the passwords in the HaveIBeenPwned release are between seven and 10 characters long. ®
Update: Thanks to CynoSure Prime's Fred Wang for expanding on, and correcting, some details in the original article.
First, we should make it clear that the Intel system referred to in the article was a test system only; the full-scale hash-cracking used dozens of machines. “That precise system was used as part of the cross-validation that we typically do as part of our research efforts”, Wang wrote.
While the CynoSure Post mentioned finding 15 hash algorithms using MDXfind, Wang said across the full release there were many others discovered.
CynoSure Prime expressed other concerns about Troy Hunt's data release, particularly that his parsing errors resulted in personal information turning up in the hash database published at HaveIBeenPwned.
Vulture South put that to Hunt, and he agreed that would happen to some degree: “it wasn't a set of clean addresses with a semicolon and a password. We might find that 99 per cent is fine, and some subset of that is not.
“The clear text data is out there anyway”.
Since “The whole point of this is to help organisations not use bad passwords”, Hunt added he'd welcome any input into “junk” in the dataset so it can be improved.