I like the conclusion. Is there a standard scheme for hash-function-then-key-derivation? You can easily tell users to run `sha256sum` but...
Though I suppose you should use public-key cryptography (e.g. GPG) if you want users to verify downloads to guard from attacks rather than just errors.
GPG/TLS/... should probably use different fingerprint algorithms though.
I think it's less about the algorithm, it's more about the representation. If the goal is that users verify these things.
SSH displays pubkey art when generating a new key. We've had secret recovery phrases for a while, they're long but really difficult to confuse with each other even if it's just a short glance. I think OpenPGP had something like that even. Then there's also GitHub that generates (or generated at some point) a custom unique profile picture for users. There are many other similar approaches out there.
It would be possible to create a standard for representing SHA256/SHA512 hashes as human-comparable images. Possibly enhanced with checksum words or with machine readability for those paranoid (without error-prone OCR).
If your images/list of emojis/phrases are created from the bits of the SHA hash, then similarity there lead to similar images/... You need a hard-to-collide process in the middle anyway.
A strong hash is already difficult to collide. Your algorithm for representation doesn't have to map similar bytes or byte sequences to similar images, but that's just a mapping not really a "hard-to-collide process".
He demonstrates without a doubt that 128 bit of hash is not enough, yet you can still find many projects still publishing MD5 hashes. The long tail of bad crypto is extremely long.
As always, it depends what properties you're relying on MD5 for. Just because something uses MS5 doesn't mean it's broken, because its preimage resistance (i.e. 'invert this hash') and second preimage resistance (i.e. 'find an input that goes to the same hash of this other input') are both not broken (yet) from a practical perspective.
Sometimes whether a cryptographic protocol relies on collision resistance can be surprisingly nuanced, so it should be phased out for this alone (and as we have better options) but for simple examples (e.g. to make a signed hash of an executable, which is probably equivalent to what you're describing) it's not broken.
They often propose verifying downloads with it, and it's pretty easy for a project insider to build two artifacts with the same MD5, one clean and one not...
Aren't you already trusting the maintainer by downloading and running their software? An evil maintainer can publish any hash they want, so why would they go to the trouble of making a hash collision?
In some setups, you're required to trust both the maintainer and the mirror, which are not always the same party. If someone can generate a collision, it means a mirror can mount an attack even when the maintainer is corrected trusted.
You can pass review that way. You publish a clean artifact that gets reviewed and vetted, and for the actual attack you replace the vetted artifact by the bad one. If you trust MD5, or as the article shows, even a good 128 bit hash, like truncated SHA-256, you get pwned. That's why you don't accept MD5 based signature as well.
People that take security seriously enough to check hashes should not trust MD5 so the scenario is not super credible, but people still publish MD5 hashed like it's the early 2000s.
That’s what I always wonder too. If we assume an attacker can change the contents at example.com/app.zip, why should we assume the hash published at example.com/download.html is any more secure?
Unless the attacker has a big influence on the "good" binary, they need to find a second pre-image, not a collision for this to be a problem. The real problems with publishing hashes is that the source of the hash is generally just as (un)trustworthy as the binary itself, and that nobody bothers verifying them in the first place.
Technically, a software project does not have to worry about collisions for hashes verifying their releases. That only becomes a problem where there is some sort of trusted third party that can be tricked into authenticating one of a pair of collided hashes.
There is a usability issue here. A 256 bit hash is so long as to be very hard to manually compare. A project might want to keep the length down to something reasonable.
Chopping (drop the first x bits) sha256 to 180 bits is probably the most secure way to go, see Block-Cipher-Based Tree Hashing by Aldo Gunsing. Bitcoin uses RIPEMD180 so the length is probably fine, but it has other properties that make chopped sha256 a better idea (length extension attacks for example).
Though I suppose you should use public-key cryptography (e.g. GPG) if you want users to verify downloads to guard from attacks rather than just errors.
GPG/TLS/... should probably use different fingerprint algorithms though.