Argh! TOCs

Nick Lamb (njl98r nospam at ecs.soton.ac.uk)
Thu, 22 Apr 1999 04:03:19 +0100 (BST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: robert nospam at moon.eorbit.net: "Re: Argh! TOCs"
Previous message: robert nospam at moon.eorbit.net: "Re: How about that autoconf'd ENDIAN stuff?"
Next in thread: Marc van Woerkom: "Re: Argh! TOCs"
Maybe reply: Marc van Woerkom: "Re: Argh! TOCs"

Well, I've just examined the CDindex server code enough to figure out how
TOC data is stored in the DB. And now I'm not so very happy.

The plan I had when I was examining the code looked like this

(1) About 20% or so of submissions are (or should be) duplicate entries
with new TOC data. The new TOC data will strongly resemble other TOC
data for the same entry, unless the new CD is a re-mastered version for
some reason (perhaps the digital masters were lost or of bad quality)

(2) Users hate navigating through Artist/ Title search, picking a CD
and then realising they picked the wrong one (I've done that)

(3) SO when submit.pl? is called, check the database for similar TOC
data and suggest any albums we have on file which look "right". If we
guess right the user saves a lot of time. Guess wrong and the user
will finish submission by the usual method.

--

But now I've seen how TOC data is stored -- "similar" matches on space
delimited text won't generate the sort of results I want.

So I suggest that the Diskid table should contain three more fields to
make it easier to do a "fuzzy" match from TOC data given to the server
by the client...

(1) Number of tracks -- fuzzy match should EXACTLY match this field
(2) Offset #0 (leadout) -- for matching length (allow +/- 75 difference)
(3) Offset #N (last track) -- disambiguator function (+/- 75 diff)

$loleadout = $myleadout - 75;
$hileadout = $myleadout + 75;
$lolast = $mylast - 75;
$hilast = $mylast + 75;
(select unique album from Diskid where tracks == $mytracks
                           and leadout > $loleadout and last > $lolast
                           and leadout < $hileadout and last < $hilast)

This should give us 0 or more albums which are similar by meaningful
criteria to the CD in the users's CD-Rom drive. The CDindex submit can
offer them up as a pick-list, with an appropriate message.

By putting the fuzzy match method on the server, we can tweak it based on
feedback from users, density of certain types of CD (perhaps never match
CDs with only one track, where TOC data is thin anyway) and on database
performance.

Nick.

Next message: robert nospam at moon.eorbit.net: "Re: Argh! TOCs"
Previous message: robert nospam at moon.eorbit.net: "Re: How about that autoconf'd ENDIAN stuff?"
Next in thread: Marc van Woerkom: "Re: Argh! TOCs"
Maybe reply: Marc van Woerkom: "Re: Argh! TOCs"