Re: Reality Check and Ideas

Matthew Mastracci (mmastrac nospam at ucalgary.ca)
Tue, 09 Mar 1999 20:24:13 -0700

Instead of a hierarchical structure, a solution similar to the way IRC
is set up might be a better way of doing things. Each server would have
a number of trusted connections to other servers (ie: if you want to
start a mirror of the server, you need to contact one of the server
admins and become "trusted") without circular connections. As well,
this allows someone with a trusted connection to the database network a
method of "cancelling" an invalid entry in the database. If they abuse
their powers at any point, they can simply be "cut out of the loop" by
their neighbours. Trust can be achieved and maintained by servers
signing each of the packets they send to other servers. Connections can
only be established with the consent of both server operators. For each
updated entry, the information is passed to each of the connected
servers, which travels like a nerve impulse to ends of the network.

For replication of missing elements, a server can query each of its
neighbours for an entry which was not propagated through the network and
requested by a user (as a result of failure of some sort). If no
servers respond with an acknowledge (within an acceptable period), the
information is returned to the user that the CD could not be found.
With CDDB, the majority of all popular CDs are in the database, so this
case would be a big exception. As well, this would allow a server as a
"caching only server" without the full load of entries.

This process could also be repeated "behind the scenes" for CD entries
which were updated/created significant period of time ago (a period of
six months or so) or during peak hours. Entries which have no newer
versions can simply be updated with the new expiry time and sit back in
the database. Most updates to the database would be minor cosmetic
changes, such as spelling errors. More grotesque errors are fairly
likely to be noticed by either the requester or a submitter and another
update would be forced. During non peak hours, connected servers can
also exchange "chunks" of the database for a mutual sync.

I guess the big issue is that CDDB is not essential information, so it
can be inaccurate (or missing) in an infitesmal number of cases without
users getting really upset over it.

Wow... that's a lot more than I planned to write. ;)

Alan Cox wrote:
>
> > transmit diffs. Heck, even using rsync to transmit the zone files
> > would work better than using the DNS zone transfer. And I _want_ to be
> > able to mirror the whole database, not just some subdomain of it.
> >
> > But the strongest argument in my eyes is that there's again some
> > central server.
>
> There is always a central point. Its either a keyserver or a replication
> server. The issue is also completely bogus. There are _thousands_ of
> people with copies of older cddb data sets, on CD whatever. The license makes
> the difference not the posession. You make the license and the policy require
> publishing the CD data
>
> > Use some more bits than 32, or you'll get collisions due to the
> > birthday paradoxon. But I agree that works for accessing the
> > information.
>
> You can just mark collisions and do further queries by MD5 hash if need be

-- 
/\/\att /\/\astracci                            mmastrac nospam at ucalgary.ca

"The act of breaking into a computer system has to have the same social stigma as breaking into a neighbor's house. It should not matter that the neighbor's door is unlocked." [Ken Thompson, 1983 Turing Award Lecture]