Google researchers and academics have today demonstrated it is possible – following years of number crunching – to produce two different documents that have the same SHA-1 hash signature.
This proves what we've long suspected: that SHA-1 is weak and can't be trusted. This is bad news because the SHA-1 hashing algorithm is used across the internet, from Git repositories to file deduplication systems to HTTPS certificates used to protect online banking and other websites. SHA-1 signatures are used to prove that blobs of data – which could be software source code, emails, PDFs, website certificates, etc – have not been tampered with by miscreants, or altered in any other way.
Now researchers at CWI Amsterdam and bods at Google have managed to alter a PDF without changing its SHA-1 hash value. That makes it a lot easier to pass off the meddled-with version as the legit copy. You could alter the contents of, say, a contract, and make its hash match that of the original. Now you can trick someone into thinking the tampered copy is the original. The hashes are completely the same.
SHA-1 is supposed to be deprecated but too many applications still support it, including the widely used source-code management tool Git. It is possible to create two Git repositories with the same head commit SHA-1 hash and yet the contents of the two repos differ: one could have a backdoor stealthily added, for example, and you wouldn't know this from the hash. The hashes would be completely the same.
Specifically, the team has successfully crafted what they say is a practical technique to generate a SHA-1 hash collision. As a hash function, SHA-1 takes a block of information and produces a short 40-character summary. It's this summary that is compared from file to file to see if anything has changed. If any part of the data is altered, the hash value should be different. Now, in the wake of the research revealed today, security mechanisms and defenses still relying on the algorithm have been effectively kneecapped.
Google's illustration how changes made to a file can sneak under the radar by not changing the hash value
The gang spent two years developing the technique. It took 9,223,372,036,854,775,808 SHA-1 computations, 6,500 years of CPU time, and 110 years of GPU time, to get to this point. The team is made up of Marc Stevens (CWI Amsterdam), Elie Bursztein (Google), Pierre Karpman (CWI Amsterdam), Ange Albertini (Google), and Yarik Markov (Google), and their paper on their work can be found here [PDF]. Its title is: "The first collision for full SHA-1."
For all the gory details, and the tech specs of the Intel CPU and Nvidia GPU number-crunchers used, you should check out the team's research paper. On a basic level, the collision-finding technique involves breaking the data down into small chunks so that changes, or disturbances, in one set of chunks is countered by twiddling bits in other chunks. A disturbance vector [PDF] is used to find and flip the right bits.
A description of Google's SHA-1 colliding PDFs can be found here. We note that the files essentially each contain a large JPEG, and the hash collision is focused on that image data. We also note that you don't have to burn another few thousand years of CPU and GPU time to create more SHA-1 collisions for simple files: thanks to Google's computations, and quirks of the PDF file format, you can from here on out produce PDFs that are visually different but still have the same SHA-1 hash value. This online tool that popped up today will easily help you create colliding PDF files.
In other words, it is now trivial for anyone to alter PDFs, webpages, and certain other simple documents, and keep the SHA-1 hash values the same, thanks to Google and co's research.
The tech world is slowly moving from SHA-1 to newer and stronger algorithms such as SHA-256. We've known for a few years that SHA-1 was looking weak, and now its vulnerability to attack is on full display. This latest research underlines the importance of accelerating the transition to SHA-256 and stronger hashing routines.
It's unlikely anyone will create rogue SHA-1 hashes for complex and sensitive stuff like TLS certificates any time soon from the team's work, due to the amount of computation power required. However, it is not beyond the reach of a large corporation or intelligence agency to craft a SHA-1 collision, if it really, really wanted to. And this process will only get easier and cheaper over time as computers get faster. It's estimated the computing power needed to produce Google and co's single collision would cost about $130,000 at today's cloud spot prices.
"Today, 10 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision," the research team said today.
"This represents the culmination of two years of research that sprung from a collaboration between the CWI Institute in Amsterdam and Google ... For the tech community, our findings emphasize the necessity of sunsetting SHA-1 usage. Google has advocated the deprecation of SHA-1 for many years, particularly when it comes to signing TLS certificates. As early as 2014, the Chrome team announced that they would gradually phase out using SHA-1. We hope our practical attack on SHA-1 will cement that the protocol should no longer be considered secure."
David Chismon, senior security consultant at MWR InfoSecurity, told The Register: "The SHA-1 algorithm has been known to be weak for some years and it has been deprecated by NCSC, NIST, and many vendors. However, until today no real-world attacks have been conducted. Google's proof of concept, and the promise of a public release of tools may turn this from a hypothetical issue to a real, albeit expensive one.
"The attack still requires a large amount of computing on both CPUs and GPUs but is expected to be within the realm of ability for nation states or people who can afford the cloud computing time to mount a collision attack."
Google has tried to set the sun on SHA-1 by having its Chrome browser mark sites "insecure" if they have HTTPS certificates signed using SHA-1. Today's research is a further shot across the bows of those ploughing on regardless in relying an obsolete cryptographic algorithm.
Chismon added: "Hopefully these new efforts of Google of making a real-world attack possible will lead to vendors and infrastructure managers quickly removing SHA-1 from their products and configuration as, despite it being a deprecated algorithm, some vendors still sell products that do not support more modern hashing algorithms or charge an extra cost to do so." ®