RE: Don't jump the gun

Schuetz, David (David_Schuetz nospam at tds.com)
Tue, 9 Mar 1999 08:03:25 -0800

> if you have two CDs with the same fingerprint, they'll produce
> the same MD5 hash. This is also why I gave myself the "out" by
> assuming a "random distribution" :-). The MD5 output is 128 bits...
> I wonder if that many bits go *in* :-)

It looks like it gets a bunch-o-bits:

* first track (normally one) - 1 byte
* Last track - 1 byte
* leadout - 4 bytes
* frameoffset - 4 bytes for each track

So, unless you have two disks with the same number of tracks, the same
4-byte leadout value, and each track starts at *exactly* the same frame
(there're, what, 200+ frames per second?), then you won't have a *source
data* collision. The MD-5 converts this large set of data (16 tracks == 70
bytes == 560 bits) into something more manageable. Obvioulsy, by reducing
the bit count you increase chances of collisions, but I've no idea how
likely that's going to be.

On the other hand, 128-bits --> 32 chars, maybe we should just use the full
source data (track count, leadout, frame offsets) in MIME-encoded format
(what's that called? not uuencode, but the base-64 one?) Anyway, the ID
would vary from 10 bytes to, say, 100. Would give very long IDs for CDs with
many tracks, but I bet you'd *never* have a collision then. Dunno. Long
IDs can be ugly if humans ever have to deal with 'em.

david.