A confusing thing that I think people haven't addressed clearly for the most par...

Zababa · on Aug 7, 2021

Thanks for the detailed explanation. I understand why that works for perceptual hashes if you make them really precise, however I doubt it would work with md5, which is why I asked.

DSingularity · on Aug 7, 2021

The discussion I thought we were having was about false positives and not adversarially induced false positives. For the former the random collisions have a probability of 1/(2^64).

To mitigate adversarial false positives one idea is to use the combination of a cryptographically strong hash along with a randomly selected perturbation of the file. Prior to hashing, perturb the file and submit both the hash and the selected perturbation to apple. Apple selects the DB based on the perturbation and proceeds with matching and thresholding.

skinkestek · on Aug 7, 2021

> The discussion I thought we were having was about false positives and not adversarially induced false positives.

I think the rest of us have been discussing how this can and will be abused, by definition by adversaries.

Many of us have also observed for years how systems are abused so we sadly have a gut feeling for this.

Zababa · on Aug 7, 2021

I thought that the random collision for md5 was way higher than that. If it's that low, you're right, this would work. I'm not sure I understand the part about the pertubation.

DSingularity · on Aug 7, 2021

It’s actually 1/(2^128).

If the attacker does not know how the image will be perturbed prior to hashing then he cannot generate an image which matches with known CSAM.