Re: cdi server (mirroring ideas)

robin nospam at acm.org
Thu, 1 Apr 1999 15:59:03 +0100

Let's look at these conflicts. I can only think of two types:
* two (or more) people both enter a new record at roughly the same time.
Given that replication in this scheme isn't instantaneous, there
is a time window in which this can happen. Depending on the speed
of replication, the window might be quite small, but even if it is
several hours, I wouldn't expect there to be very many conflicts
caused this way.
* someone corrects an existing entry. This is much more likely.
As it is an update to a specific version of a specific record,
a simple chain of updates can be automatically followed to its
conclusion. But I would recommend keeping the intermediate
records also, so that bad updates can be undone and some sort of
human mediation is possible.

On Wed, 31 Mar 1999 14:34:58 -0600 Robert Haig <rhaig nospam at hackboy.com> wrote:
> I propose this:
> Each server is going to have an admin (at least one). Conflicting records
> are put in a report to that admin. The admin can then manually update
> the record if they see fit. After they compare it to a few other servers or
> the cd that they have.
If we are hoping to get large numbers of servers installed by interested
but largely uninvolved parties, I don't think we can expect every one to
have an admin person willing to undertake these duties. Better to have
a way for update records to propagate in the same way as the originals.

It would be good if there was a way to contact the contributor(s) of
conflicting records, but as we've mentioned before, it isn't a good idea
to pass email addresses about in the public database. There is a way
around this. If you trust the server to which you submit your records,
you can tell it your email address. It can make up a userid for you,
which it can publish on the records you sent as serverid:userid.
It also needs to keep a private table of userid:email. Then as long
as that server is still running, it is possible to get in touch with
the contributor of a record by asking serverid to forward a message
to userid. Also, the original contributor can get in touch with the
server they originally used and prove they know the userid:email secret
to make authenticated corrections.

> And no matter what we choose, I'd suggest that each server have at least
> 2 peers. 1 peer servers run the risk of being orphans.
It would be nice if servers could configure their own peer arrangements
once they have joined the network. Certainly we need a way for clients to
find ``near-by'' servers, so presumably the servers will have to exchange
addresses and connectivity information. I can't imagine having millions
of servers, so even a complete table won't be unmanageable.

We seem to be assuming that all servers replicate the whole database.
Is that necessary or practical? I could imagine many people being willing
to run a server on a cast-off 486, but not if they have to commit to
providing gigabytes of disk space! I think there will probably have to
be a backbone of servers which know all the records, but maybe that is
just one end of a spectrum extending all the way down to caching-only
servers that store nothing. Indeed, in this model, a client is just a
non-caching, non-storing server. Perhaps a stratum hierarchy would work,
so a server would always asks its siblings first, then if that fails,
asks its parent. The backbone would just be the servers with no parent.
Servers would push new records at their parents, and optionally push
updates at their children.

Robin.

-- 
R.M.O'Leary <robin nospam at acm.org> +44 7010 7070 44, PO Box 20, Swansea SA2 8YB, UK