Re: cdi server (mirroring ideas)

robert nospam at moon.eorbit.net
Thu, 1 Apr 1999 10:29:04 -0800 (PST)

On 1 Apr, robin nospam at acm.org wrote:
> Let's look at these conflicts. I can only think of two types:
> * two (or more) people both enter a new record at roughly the same time.
> Given that replication in this scheme isn't instantaneous, there
> is a time window in which this can happen. Depending on the speed
> of replication, the window might be quite small, but even if it is
> several hours, I wouldn't expect there to be very many conflicts
> caused this way.
> * someone corrects an existing entry. This is much more likely.
> As it is an update to a specific version of a specific record,
> a simple chain of updates can be automatically followed to its
> conclusion. But I would recommend keeping the intermediate
> records also, so that bad updates can be undone and some sort of
> human mediation is possible.

Yes, some roll-back capabilities would be nice. Worst case I can image
a group of people maliciously sending bad update requests that will end
up trashing the entire database. Worst fallback there is make sure that
at least some servers are doing frequent back-ups. Restore to backup
and let the data ripple back through the system.

That's too heavy handed of an approach -- what if each record that gets
updated gets written to a 'rollback log', so that if a record is
identified to be bad, a script could cruise through the rollback log and
find the previous contents of the record and accept it into the
database?

> It would be good if there was a way to contact the contributor(s) of
> conflicting records, but as we've mentioned before, it isn't a good idea
> to pass email addresses about in the public database. There is a way
> around this. If you trust the server to which you submit your records,
> you can tell it your email address. It can make up a userid for you,
> which it can publish on the records you sent as serverid:userid.
> It also needs to keep a private table of userid:email. Then as long
> as that server is still running, it is possible to get in touch with
> the contributor of a record by asking serverid to forward a message
> to userid. Also, the original contributor can get in touch with the
> server they originally used and prove they know the userid:email secret
> to make authenticated corrections.

That approach worries me. The servers would then be vulnerable to
attack since they store confidential information. I'm not sure I would
want to trust a database of e-mail addresses to everyone who wanted to
run a server.

And as you pointed out, this paper trail is only good for the life of
the server. That combined with the fact the e-mails addresses are at
best temporary, I would really question the value of this.

I agree with you in that it would be nice to have a paper trail back to
the author of the record, but I think technology is not quite up to
snuff for that.

>> And no matter what we choose, I'd suggest that each server have at least
>> 2 peers. 1 peer servers run the risk of being orphans.
> It would be nice if servers could configure their own peer arrangements
> once they have joined the network. Certainly we need a way for clients to
> find ``near-by'' servers, so presumably the servers will have to exchange
> addresses and connectivity information. I can't imagine having millions
> of servers, so even a complete table won't be unmanageable.

I don't know if this has any merit or not, but could we use ping times
to determine the nearest neighbor? A new server could be brought up,
pointing it to one other CD Index server. The new server would request
the list of known servers, and then in succession ping each one and
choose two or more with the smallest ping times. Should that process be
repeated on a regular basis to attempt an automatic nearest neighbor
search?

> We seem to be assuming that all servers replicate the whole database.
> Is that necessary or practical? I could imagine many people being willing
> to run a server on a cast-off 486, but not if they have to commit to
> providing gigabytes of disk space! I think there will probably have to

Well, at the current rate, we've got about 3500 CDs in the system and
that is taking about 3.8Mb of disk space. That's a bit more than 1Mb
per 1000 CDs. Given a 1 gig partition (which isn't asking much) that
would give us room to store a bit less than 1,000,000 CDs. CDDB
currently has less than 500,000 CDs, so we've got room to grow if we
require servers to dedicate a gig. Is that unreasonable?

--ruaok Freezerburn! All else is only icing. -- Soul Coughing

Robert Kaye -- robert nospam at moon.eorbit.net http://moon.eorbit.net/~robert