Re: Fuzzy matching

robert nospam at moon.eorbit.net
Mon, 12 Jul 1999 06:29:01 -0700 (PDT)

On 12 Jul, Nick Lamb wrote:
> On Mon, 12 Jul 1999 robert nospam at moon.eorbit.net wrote:
>
>> Subdividing the space on lead-out is not a good idea. Most of the lead
>> outs that I've seen start at 150. That's no bueno. If I organize the
>> database such that I can search on the first few tracks, if not all of
>> them, then I could construct a query that return quite a lot fewer
>> items.
>
> Been awake too long Robert. If you own a lot of CDs where the Lead-OUT
> starts at 150, you should sell them, they're all BLANK.

Um. Uh yeah -- please stike lead-out from my message and replace it
with track 1. Most track 1s start at 150...

> Subdividing on early (first?) track is a bad plan because you'll find
> out that a lot of Pop music producers have heard of the Golden Rules
> and they'll make all their tracks approximately the same length. This
> puts all those CDs (and there's a lot of them, it's not called Popular
> Music for nothing) into the same subdivision.

One track won't be enough even if the golden rule didn't exist. It
would return too many items.

> Instead I was suggesting, IIRC, that you use the LEAD OUT entry. This
> in effect marks the *length* of the CD. Conventions of album recording
> mean that you'll still see some clustering, but it should give you
> acceptable results (much better than 1st track length)

That might work, but again as above, there may be too many matches with
that technique.

> Even better -- this distinguishes the case where albums are re-released
> with an extra track for sales reasons, if we're not already ignoring it
> at this stage because of the # of tracks. The first N tracks will of
> course be the same in the re-release, only the increases length &&
> extra TOC entry are different.

Eeeek. I think that is going too far with this logic -- afterall, a CD
that has a different number of tracks should have a completely new
entry in the index. So, a fuzzy search should eliminate all CDs that
don't have the same number of tracks...

I think I'll be able to write one nasty SQL query that will perform the
whole match on the server. I don't know what that will do to the SQL
server...

--ruaok Freezerburn! All else is only icing. -- Soul Coughing

Robert Kaye -- robert nospam at moon.eorbit.net http://moon.eorbit.net/~robert