Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Colliding Secure Hashes (da.vidbuchanan.co.uk)
89 points by Retr0id on Dec 22, 2023 | hide | past | favorite | 25 comments


I like the conclusion. Is there a standard scheme for hash-function-then-key-derivation? You can easily tell users to run `sha256sum` but...

Though I suppose you should use public-key cryptography (e.g. GPG) if you want users to verify downloads to guard from attacks rather than just errors.

GPG/TLS/... should probably use different fingerprint algorithms though.


I think it's less about the algorithm, it's more about the representation. If the goal is that users verify these things.

SSH displays pubkey art when generating a new key. We've had secret recovery phrases for a while, they're long but really difficult to confuse with each other even if it's just a short glance. I think OpenPGP had something like that even. Then there's also GitHub that generates (or generated at some point) a custom unique profile picture for users. There are many other similar approaches out there.

It would be possible to create a standard for representing SHA256/SHA512 hashes as human-comparable images. Possibly enhanced with checksum words or with machine readability for those paranoid (without error-prone OCR).


If your images/list of emojis/phrases are created from the bits of the SHA hash, then similarity there lead to similar images/... You need a hard-to-collide process in the middle anyway.


A strong hash is already difficult to collide. Your algorithm for representation doesn't have to map similar bytes or byte sequences to similar images, but that's just a mapping not really a "hard-to-collide process".


> A strong hash is already difficult to collide

I don't understand. The whole point of the article is that this isn't true.


He demonstrates without a doubt that 128 bit of hash is not enough, yet you can still find many projects still publishing MD5 hashes. The long tail of bad crypto is extremely long.


As always, it depends what properties you're relying on MD5 for. Just because something uses MS5 doesn't mean it's broken, because its preimage resistance (i.e. 'invert this hash') and second preimage resistance (i.e. 'find an input that goes to the same hash of this other input') are both not broken (yet) from a practical perspective.

Sometimes whether a cryptographic protocol relies on collision resistance can be surprisingly nuanced, so it should be phased out for this alone (and as we have better options) but for simple examples (e.g. to make a signed hash of an executable, which is probably equivalent to what you're describing) it's not broken.


Or you can just use SHA-256 instead and not bother with subtle details about which uses are safe and which are not.


Or you can't, for example because you already signed a 20 year root with SHA-1. Nor does it matter in this example and many others.


Still depends. Using SHA-256 for password storage is bad. Argon2 would be a much safer bet. Or maybe scrypt or yescrypt.


And blindly use a suggestion from hn without understanding the tradeoffs.


They often propose verifying downloads with it, and it's pretty easy for a project insider to build two artifacts with the same MD5, one clean and one not...


Aren't you already trusting the maintainer by downloading and running their software? An evil maintainer can publish any hash they want, so why would they go to the trouble of making a hash collision?


In some setups, you're required to trust both the maintainer and the mirror, which are not always the same party. If someone can generate a collision, it means a mirror can mount an attack even when the maintainer is corrected trusted.


No, that would require second preimage.


You can pass review that way. You publish a clean artifact that gets reviewed and vetted, and for the actual attack you replace the vetted artifact by the bad one. If you trust MD5, or as the article shows, even a good 128 bit hash, like truncated SHA-256, you get pwned. That's why you don't accept MD5 based signature as well.

People that take security seriously enough to check hashes should not trust MD5 so the scenario is not super credible, but people still publish MD5 hashed like it's the early 2000s.


That’s what I always wonder too. If we assume an attacker can change the contents at example.com/app.zip, why should we assume the hash published at example.com/download.html is any more secure?


Depends on what the artifact is.


Unless the attacker has a big influence on the "good" binary, they need to find a second pre-image, not a collision for this to be a problem. The real problems with publishing hashes is that the source of the hash is generally just as (un)trustworthy as the binary itself, and that nobody bothers verifying them in the first place.


MD5 is also broken for much worse reasons than only being 128 bits long, you can generate collisions near instantly


Technically, a software project does not have to worry about collisions for hashes verifying their releases. That only becomes a problem where there is some sort of trusted third party that can be tricked into authenticating one of a pair of collided hashes.

There is a usability issue here. A 256 bit hash is so long as to be very hard to manually compare. A project might want to keep the length down to something reasonable.


Or find ways to easily compare them without doing it manually.


Chopping (drop the first x bits) sha256 to 180 bits is probably the most secure way to go, see Block-Cipher-Based Tree Hashing by Aldo Gunsing. Bitcoin uses RIPEMD180 so the length is probably fine, but it has other properties that make chopped sha256 a better idea (length extension attacks for example).


Not really. He generated some partially colliding 96 bit hashes, then threw away non-matching characters, and claimed to have a collision.

If you let me pick which bits matter for collision detection, then I can make any secure hash collide with itself.

On a related note, someone took the end of Contact to heart, and definitively proved that God is a Where's Waldo? fan:

https://kundor.github.io/Finding-Waldo/


I never threw away any non-matching characters, it's a full 96-bit collision, period.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: