Argh! TOCs

Nick Lamb (njl98r nospam at ecs.soton.ac.uk)
Thu, 22 Apr 1999 04:03:19 +0100 (BST)

Well, I've just examined the CDindex server code enough to figure out how
TOC data is stored in the DB. And now I'm not so very happy.

The plan I had when I was examining the code looked like this

(1) About 20% or so of submissions are (or should be) duplicate entries
with new TOC data. The new TOC data will strongly resemble other TOC
data for the same entry, unless the new CD is a re-mastered version for
some reason (perhaps the digital masters were lost or of bad quality)

(2) Users hate navigating through Artist/ Title search, picking a CD
and then realising they picked the wrong one (I've done that)

(3) SO when submit.pl? is called, check the database for similar TOC
data and suggest any albums we have on file which look "right". If we
guess right the user saves a lot of time. Guess wrong and the user
will finish submission by the usual method.

--

But now I've seen how TOC data is stored -- "similar" matches on space delimited text won't generate the sort of results I want.

So I suggest that the Diskid table should contain three more fields to make it easier to do a "fuzzy" match from TOC data given to the server by the client...

(1) Number of tracks -- fuzzy match should EXACTLY match this field (2) Offset #0 (leadout) -- for matching length (allow +/- 75 difference) (3) Offset #N (last track) -- disambiguator function (+/- 75 diff)

$loleadout = $myleadout - 75; $hileadout = $myleadout + 75; $lolast = $mylast - 75; $hilast = $mylast + 75; (select unique album from Diskid where tracks == $mytracks and leadout > $loleadout and last > $lolast and leadout < $hileadout and last < $hilast)

This should give us 0 or more albums which are similar by meaningful criteria to the CD in the users's CD-Rom drive. The CDindex submit can offer them up as a pick-list, with an appropriate message.

By putting the fuzzy match method on the server, we can tweak it based on feedback from users, density of certain types of CD (perhaps never match CDs with only one track, where TOC data is thin anyway) and on database performance.

Nick.