Re: Fuzzy matching

Nick Lamb (njl98r nospam at ecs.soton.ac.uk)
Mon, 12 Jul 1999 20:13:15 +0100 (GMT)

On Mon, 12 Jul 1999 robert nospam at moon.eorbit.net wrote:

> Subdividing the space on lead-out is not a good idea. Most of the lead
> outs that I've seen start at 150. That's no bueno. If I organize the
> database such that I can search on the first few tracks, if not all of
> them, then I could construct a query that return quite a lot fewer
> items.

Been awake too long Robert. If you own a lot of CDs where the Lead-OUT
starts at 150, you should sell them, they're all BLANK.

Subdividing on early (first?) track is a bad plan because you'll find
out that a lot of Pop music producers have heard of the Golden Rules
and they'll make all their tracks approximately the same length. This
puts all those CDs (and there's a lot of them, it's not called Popular
Music for nothing) into the same subdivision.

Instead I was suggesting, IIRC, that you use the LEAD OUT entry. This
in effect marks the *length* of the CD. Conventions of album recording
mean that you'll still see some clustering, but it should give you
acceptable results (much better than 1st track length)

Even better -- this distinguishes the case where albums are re-released
with an extra track for sales reasons, if we're not already ignoring it
at this stage because of the # of tracks. The first N tracks will of
course be the same in the re-release, only the increases length &&
extra TOC entry are different.

Nick.