Git takes baby steps towards swapping out vulnerable SHA-1 hashing algo for SHA-256

It's proving a bit of a headache

The Git version control system has moved closer towards using SHA-256 rather than the compromised SHA-1 for its hash algorithm, to help to protect code from tampering.

Whenever code is committed into a Git repository, the software calculates and stores a hash value. When you retrieve the code, the hash is recalculated to ensure that the code is the same. Git also uses these hash values as a database key and to avoid storing the same code twice. If the hash value is the same, the code is presumed to be the same.

What this means is that the hashing algorithm is at the heart of how Git functions. Git uses SHA-1, but in early 2017 it was shown by a team of Google engineers and others that SHA-1 can be broken, meaning that there is a technique for finding collisions, defined as different data that has the same hash value. We reported on this here.

Someone upset about a hashtag

Hash snag: Security shamans shame SHA-1 standard, confirm crucial collisions citing circa $45k chip cost


The potential consequences are serious because it means that code in a Git repository could be tampered without detection. Since Git is the world's most popular version control system, that is a frightening possibility.

Last year, at the US Real World Crypto Symposium, there were demonstrations of a SHA-1 collision attack that is "much more threatening for real protocols".

Not so easy to break, nonetheless. Just ask Linus £$%^ Torvalds

In practice the weakness is not easy to exploit. Linus Torvalds, the creator of Git, said at the time: "The sky isn't falling. There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a 'content identifier' for a content-addressable system like git."

Git is not secured by the hash algorithm, and finding collisions is not trivial. Further, there's the challenge of finding a collision that works with tampered code that achieves some malicious goal. Torvalds also observed: "Git doesn't actually just hash the data, it does prepend a type/length field to it. That usually tends to make collision attacks much harder, because you either have to make the resulting size the same too, or you have to be able to also edit the size field in the header."

The weakness is real, though, and the Git team began investigating what it would take to replace it. A glance down this thread shows how challenging it is to replace the algorithm while maintaining compatibility with existing repositories. But a plan was formed and the Git hash function transition to SHA-256 is documented here.

Nearly three years on, Git developer Brian M Carlson has now posted to the git developer mailing list a request for comments on "part 1 of 3 of a SHA-256 implementation". It is not a complete implementation since "you can create a SHA-256 repository, but will be unable to read it". Further: "It lacks support for cloning, fetching, and pushing, which while considered non-goals in the transition plan, are required for the test suite to even come close to passing."

Carlson has promised to follow up with part 2, covering interoperability between SHA-1 and SHA-256 repositories, and part 3, providing the missing pieces to enable tests to run against the new version. You can also see the full work in progress implementation here.

"Because this series sets up (and documents!) a useless option (which is a large footgun) and because I'd like feedback about this approach, this series is RFC [Request for Comments]," Carlson said.

As Jonathan Corbet, editor of LWN, observed: "This work is unlikely to land on the systems of most Git users for some time yet." It has not been treated as urgent because of the difficulty in exploiting the SHA-1 weakness, but a fix will be most welcome. ®

Other stories you might like

Biting the hand that feeds IT © 1998–2022