Re: Summary of the days ideas #1

August Zajonc (augustz nospam at bigfoot.com)
Wed, 10 Mar 1999 10:19:35 -0800

Like your summery, and I support the RDBMS, but wanted to point out that for
the most common query, which will be CD Name, Artist Name/Track titles
dumping to a text file and quesry from there will probably be slightly more
efficiant... Then again it may be a so what kind of deal anyways as there is
enough horspower to go around. And you can still use the RDBMS on the
backend anyways...

August

-----Original Message-----
From: Bremford, Mike <Mike.Bremford nospam at gs.com>
To: 'cdindex nospam at freeamp.org' <cdindex nospam at freeamp.org>; 'cdin nospam at cdin.org'
<cdin nospam at cdin.org>; 'freecddb-devloper nospam at bigred.lcs.mit.edu'
<freecddb-devloper nospam at bigred.lcs.mit.edu>
Date: Wednesday, March 10, 1999 10:11 AM
Subject: Summary of the days ideas #1

Hi folks

I'm sure I'm not the only one who's going to do this, so please consider
this a summary in progress. This is as much for my own reference as anyone
elses, anyway.

The following discussions *seem* to have taken place over the last day or
so:

1. Protocol. A number of suggestions have been made, but they seem IMHO
to boil down to two.
a) A fixed field format
b) An extensible one (read XML).

Option a) is: Easier to implement, simpler, but limited in what fields
it can carry. Unlikely to keep everybody happy, as no matter how many
fields we add, there will always be one we've missed.

Option b) is: More flexible, with a proposal to have a subset of
mandatory fields which are recognised by all clients, and as many
optional fields as we like. Possible issue with bandwidth however, and
is harder to implement. May be unnecessarily complex

2. Distribution. This is hairy. The following methods have been mentioned
a) DNS (with HTTP proxy for those with firewalls)
b) NNTP, or at least a minor variation on it. (with HTTP proxy again)
c) HTTP with mirroring (as CDDB does now)

Option a) is: Already implemented on all platforms, works through most
firewalls, gives automatic caching and replication, will handle huge
numbers of records and is fault tolerant. But, it also relies on
current DNS servers, which may be too slow, will not allow submission
of entries (at least easily), and has a central point where everything
is replicated from.

Option b) is: Has no central point, replicates everywhere over time,
allows submission of entries. However, it doesn't handle duplicate
entries, and has no way of ensuring against data loss at one site. (It
also is blocked by a large number of firewalls - mine included).

Option c) is: Well understood (it is the current model used by CDDB),
passed through most firewalls, and fairly simple. However, it is
slightly slower than the other models, and means higher stress on
individual machines rather than spreading the load. Mirrors also have
to be set up manually, as HTTP has no provision for distribution.

3. Server. I think most people are settled on an RDBMS of some sort, as
opposed to the other option of a flat file. Free options include
mySQL and Postgres. Someone else suggested that it really doesn't
matter how each server handles it - the Perl::DBI library should
allow you to chop and change DB's to a point.

(NB. This is assuming some sort of centralised server to receive the
submissions before farming them out to DNS/HTTP. I'm not sure how
the concept of a "server" fits in with the NNTP model)

4. Submission. Hasn't been much talk on this, come to think of it. CDDB
currently submits entries via email I believe (I may be wrong here),
but I guess we can narrow it down to three options
a) Fixed format Email
b) Simple one way protocol - like a form submission on the web
c) More complex protocol with provision for feedback from the server
d) NNTP post, assuming we go for the NNTP model

Option a) means that if the server receiving the submission is down,
the submission is cued.
Option b) means submissions are processed quicker, and is nice &
simple
Option c) is more complicated, but gives greater ability to catch
user submission errors early on.

5. Genres.
a) Stick with CDDB's current list
b) Use the ID3 list of genres
c) Have some sort of main category with user entered keywords
For all of these, we can have either a single or multiple genres

6. Licensing. I know sod all about this, but these are the buzzwords I
recognised.
a) LGPL
b) BSD

7. Conflict resolution. From the simplest to the most complex we have
a) Latest entry rules.
b) Most common entry rules
c) PGP signed submissions - so the submitter is accountable in case
of spam
(I have to admit this last one seems like overkill. Spamming a free
music database seems an awful lot like pissing in your own pool to me)

8. Mailing lists :-)
David suggested splitting the list into several, and I haven't heard
anyone disagree yet, so I guess we'll go that way when Greg Stein
reappears. AFAIK there are three lists to date:
a) cdin.org
b) bigred.lcs.mit.edu
c) freeamp.org

9. A few points that everyone seems to agree on (yes, there are some):
* Use of UTF-8 (Unicode) to allow foreign character sets
* Allow current CDDB clients to access this new service by providing
some sort of portal.

I think that's it. Apologies if I missed anyones ideas out, but I am (we
are) drowning in email here.

Personally, My opinion would be to go for the single server, receiving email
submission, processes it to handle conflicts, and then pops it out to DNS
for replication, which is then queried by the clients (either directly or
via an HTTP proxy). But then, as they say, opinions are like arseholes -
everybody has one.

Hope this helps. If I've got it all horribly wrong, please ignore me.

Cheers... Mike