Re: a whole pile of questions

Darin Adler (darin nospam at spies.com)
Wed, 31 Mar 1999 10:55:27 -0800

>>[I ask about the format of the output of hget.pl]

>This is all pretty stable now. I've also included XML support --
>instead of using hget.pl or get.pl use xget.pl.

>>[I mention the issues in parsing the output of hget.pl]

>Use get.pl instead -- it returns a mcuh simpler format.

Since writing my note, I've noticed the new XML option, and switched to
using the XML format instead of the hget.pl one.

>Right now we're using the ISO-88591 (is that the right number?)
>character set, but I would like to move to UNICODE in the future.

I think that XML implies Unicode (in some kind of 8-bit encoding). Isn't
that right? The XML parsing library I'm using returns all the text to me
in Unicode format (or in UTF-8 if I want to use 8-bit characters
internally) and I assume it's handling the character set issue for me.

>Enter [data track] for the title of a data track.

OK. That's a change to my program, but I'll make the change today. It
would be useful if I could locate a specimen CD with a data track (I
swear I had one when I was doing my CDDB support), but I've only found
ones that put the data before the first audio track (or perhaps after the
leadout).

Does anyone know of a widely-available CD with a data track on it?

>>-- I did a lookup on an album with 15 tracks, Bad
>>Religion's "stranger than fiction", ID
>>FUOVM1t7bROszZlP5Sk0Bzfp_Ow-. The returned information is
>>for an album with 17 tracks, presumably a different
>>version of the same recording with bonus tracks. Luckily,
>>the first 15 tracks happen to match. What's the CD Index
>>approach for this sort of thing?

>I'm working on fixing such problems -- the latest
>submission scripts are much more careful about checking the
>number of tracks and so forth. We'll need to put more
>safeguards in place...

So your intent is to keep the 15-track and 17-track versions of an album
like this separate, even though they are "the same"? That sounds right to
me. You might want a "related recordings" feature like the one in the
Internet Movie Database, where they link versions of the same movie
together.

The number of tracks is not the only issue though. If someone enters a
CD, chooses a title from the list, and then discovers that the tracks are
not really the same CD, there should be a path in the CD entry process
that lets them create a distinct CD with the same title or simply
reconsider linking this CD ID to the existing title. Don't you think?

>I haven't really decided exactly what extra data to store.

I think that the Internet Movie Database is an excellent example here.
Here's some food for thought:

Extra data on CDs (in some cases per track) could include:

= copyright date
- performance date
- release date
- label (publisher)
= composer name
= lyricist name
- producer name
- other credits
- cover art (JPEG?)
- trivia notes
= links to related discs (another version of the same disc, other
discs in a series of some sort e.g. disk 2 of a 2-disk collection)
- transcribed liner notes
- links to reviews
- reviews by CD Index users
- technical specifications (the old ADD stuff?, others)
- awards

Extra data on contributors (artists, composers, lyricists, other) could
include:

- links to fan sites
- alternate names used by the artist
- trivia
- birth/death date and place
- awards

Artists could be given unique IDs too, so that two artists with the same
name could be distinguished.

I'm not suggesting doing all of these. There are a few that I think are
best suited to the stage we're at now. But it's a straw man list.

The most valuable items to me seem to be the names of the composer and
lyricist (especially for classical CDs), copyright date, and links to
other versions of the same disc and other discs in a multi-disc
collection.

As database becomes more precise, we could deal with the fact that some
of these items are per-CD (should be stored separately for each CD ID),
others are per-title (shared by all CDs that are variant versions of the
same thing), others are per-track (same for all tracks on many CDs), and
of course others are per-person.

>Another thing that I was thinking about is the data that is
>contains in the CD Researcher -- I am working on an XML
>submission process. Could we build a feature into the CD
>Researcher that allows the user to upload the data to the
>CD Index?

People definitely want to upload data. But CD Researcher aka Audiofile
Internet Companion is a "lookup-only" adjunct program that's used for
importing information into a database; it doesn't have a user interface
for typing in information. People who use the Audiofile database enter
information about their CDs, but rarely put the accurate CD track lengths
in with it.

A future version of Audiofile could correlate someone's Audiofile
database with the actual CD lengths and then upload. The current
structure with a separate Audiofile Internet Companion program doesn't
facilitate this, but at some point it makes sense to build the feature
into the database instead of having a separate program on the side.

I have added a feature to the CD Index version of CDR/AIC that launches a
web browser and takes you to the CD Index CD entry page. This lets people
look up a batch of CDs and then enter the information for the ones that
aren't in the database yet. This is a reasonable starting point.

-- Darin