Re: [DB] Summary of Current Suggestions

August Zajonc (augustz nospam at bigfoot.com)
Wed, 10 Mar 1999 14:16:53 -0800

Ahhhh, the wonderful world of DB design...

It seems pretty clear that the queries are not going to come directly off
this master db, but even if they did it should be completly normalized.

Basically, that means there is going to be a table for just about
everything, from songs, to performers to cds... Want to add a new genre, add
an entry in the genre lookup table. For queries all this will the simplify
down to something easy to handle. But the richnes remains. Someone could
code a client so there is a drop down list of artists on the data add page,
so that we can have little bios on the performers that don't need to be
entered more than once (as a far out example)... Someone else could code
clients which do other similarly gimicky things...

It would be important however that the data stays relativly clean... On one
hand, selecting an artist from a list insures against sily typos, on the
other hand, when people submit without the list, multiple entries could
occur (something that happens anyways under other systems...) I don't think
this will be a huge issue, especially with a little coding server side to
help out. And since clients are connected to the net, they can check if
artist exists, and we can really avoid some of the crump that got into cddb
proper.

I've got a mySQL server ready to roll, let's first spec out what the cddb
replacement needs bare minimum, then work on the extras...

August

-----Original Message-----
From: Schuetz, David <David_Schuetz nospam at tds.com>
To: 'cdin nospam at cdin.org' <cdin nospam at cdin.org>
Cc: 'cdindex nospam at freeamp.org' <cdindex nospam at freeamp.org>;
'freecddb-developer nospam at bigred.lcs.mit.edu'
<freecddb-developer nospam at bigred.lcs.mit.edu>
Date: Wednesday, March 10, 1999 1:50 PM
Subject: [DB] Summary of Current Suggestions

Okay, here's my first summary of DB-related postings. Lots of ideas
floating around there. I'll follow-up shortly with a more formal posting
that could/should lead to proposals/specs.

[in no particular order:]

** Ragnar Kjørstad sugested that the DB resolve down to song titles, for
same-song on diff-cd support. Martin Nilsson suggests something similar,
even suggesting we use the ISRC encoded on some CDs for song identification.
I'm afraid that some/many/most CDs won't have reliable ISRC information, or
that not all players could get to it, so that'd be out. I'm really afraid
that this kind of relational approach, while really cool and great if you're
doing an "Internet Movie Database" approach to CDs, wouldn't be practical
when you've got fairly dumb, non-interactive clients shoving new data at a
server.

** don't store hashed IDs -- just store raw ToC data, let internal DB
implementations decide on a hash for internal speed, and store those in a
separate table. If necessary. This frees clients from any one hashing
algorithm.

** Eddy Gurney points out that ome kind of support to consumer jukeboxes,
that may only return MM:SS info (w/out frame info) for ToC data, would be
good. (Maybe the server can do fuzzy matches for them, but use frame and/or
leadout for systems that can to reduce collisions.)

** Seems to me that we might also have user-defined abbrevs for songs,
artists to better support devices with limited display capabilities (ABITW2,
etc.)

** XML stuff might help us with sending specific datums back to the client
that might not be used everywhere. Or a "property,value" ordered pair sort
of delivery and storage might be in order. XML might make a good local
storage format...

** Larry Kain proposed a cool-named abstraction called "MuRuPu." Should we
dig into something really abstract, or stick closer to a representation of
physical media? Could have good uses when trying to do things like
classical music CDs, with multiple performances of basically the same
material.

** Extended character sets? Should at least leave that open. Seems that
UTF-8 has been suggested as a standard (I don't know enough about this to
comment).

** Multiple languages? Definitely, at least to support multiple title names
(some classical CDs have track titles in English and, say, French and
German)

** Genres -- Mike Bremford suggested three alternatives: CDDB's current
list, MP3 ID3 list, and "main entry with user entered keywords." I think we
could start with the main CDDB categories, then create subs, perhaps.

** Conflict resolution -- I'm inclined to stay with "Latest Entry Wins,"
simply for simplicity. Could we maybe have a "confidence" measure that
could be clicked, and when enough people say "yeah, that looks right," the
record is frozen, and re-submissions get bounced to some "authority" for
resolution? Like I said, "last entry wins" is a lot easier. At any rate,
this isn't a DB issue.

** Robin O'Leary would really like to see some sort of "supertrack" that
encompasses multiple tracks (for ease of proper shuffling choices, among
other reasons). This also allows for entering the "piece" title once (not
having to repeat it on each constituent track). I suggest that a time-based
approach would be better, simply because it offers more flexibility, though
some sort of direct indication of hierarchical location should be retained.
Doing it in a time-based manner also allows for "super tracking" pieces that
split in the middle of a track (not that I can think of any examples, but
hey...)

** I would like to see sub-tracks defined, for splitting out long pieces
(classical, pink floyd, etc.). Some CDs support built-in Index marks, but
most players don't support that, and certainly those aren't included in
CDDB. By taking the same approach as suggested for supertracks, we can skin
both cats at once.

** Last name? First name? Does it matter? Only really important when
trying to do a sort order. Maybe an "artist" field, and a "sort as" or
"file as" field. Enter "Peter Gabriel" in the first, and "Gabriel, Peter"
in the second. Maybe.

** Titles, sub-titles, collection titles--suggest "standard" use for
collection titles as subs ("Billboard Best Music Vol 6 -- The Yoedelers"
becomes "The Yoedelers" as primary title, "Billboard Best Music Vol. 6" as
subtitle.

** many people have requested artist-per-track information, for various
artist CDs. Might even want to add per-track album info, for those
compilations that list original artist and original album

###