RE: cdi server (mirroring ideas)

Shawn Jacques (jax nospam at alh.com)
Thu, 1 Apr 1999 11:38:08 -0800

> -----Original Message-----
> From: robert nospam at moon.eorbit.net [mailto:robert nospam at moon.eorbit.net]
> Sent: Thursday, April 01, 1999 10:29 AM
>
> Yes, some roll-back capabilities would be nice. Worst case I can image
> a group of people maliciously sending bad update requests
> that will end
> up trashing the entire database. Worst fallback there is make
> sure that
> at least some servers are doing frequent back-ups. Restore to backup
> and let the data ripple back through the system.
>
> That's too heavy handed of an approach -- what if each record
> that gets
> updated gets written to a 'rollback log', so that if a record is
> identified to be bad, a script could cruise through the
> rollback log and
> find the previous contents of the record and accept it into the
> database?

I think this might be a case of crossing that bridge when we come to it.
I don't remember there ever being any large scale "attacks" on cddb
trying to "corrupt" the database by submitting a bunch of bad records.
Of course, that doesn't mean it can't happen :-), but in a system where
we are relying on the data being entered in a distributed way I don't
think there is going to be a foolproof solution.

Given that, one possibility is that a hash could be made from a
combination of the users email address and IP address and that could be
stored. If it became obvious that a particular user or IP address was
entering in a slew of bad records, they could all be rolled back based
on this stored hash (assuming a rollback log for each record).

For general data conflicts (spelling errors, somebody enters the wrong
track, etc) it probably makes sense to just flag the conflict, and send
email off to some human being. If no-one cares enough to make the
change, thats probably fine..the next person who likes that album, looks
it up, and notices the error will probably update it. Given this
system, I'd say having the latest change take precedence will keep the
propagating errors to a minimum.

> I don't know if this has any merit or not, but could we use ping times
> to determine the nearest neighbor? A new server could be brought up,
> pointing it to one other CD Index server. The new server would request
> the list of known servers, and then in succession ping each one and
> choose two or more with the smallest ping times. Should that
> process be
> repeated on a regular basis to attempt an automatic nearest neighbor
> search?

I like this idea. Depending on which direction the data is flowing (if I
enter a new CD does this propagate to the other servers, or do the other
servers query "nearby" servers for any new information?) you'd have to
be careful to not "orphan" servers with low ping times. Otherwise, it
sounds like a pretty simple (and useful) "load balancing" system.

>
> > We seem to be assuming that all servers replicate the whole
> database.
> > Is that necessary or practical? I could imagine many
> people being willing
> > to run a server on a cast-off 486, but not if they have to commit to
> > providing gigabytes of disk space! I think there will
> probably have to
>
> Well, at the current rate, we've got about 3500 CDs in the system and
> that is taking about 3.8Mb of disk space. That's a bit more than 1Mb
> per 1000 CDs. Given a 1 gig partition (which isn't asking much) that
> would give us room to store a bit less than 1,000,000 CDs. CDDB
> currently has less than 500,000 CDs, so we've got room to grow if we
> require servers to dedicate a gig. Is that unreasonable?
>

Given the cost of disk space currently, putting a couple gig drive in a
machine will run you about the same cost as a good dinner for two. I'd
say thats a decent metric. :-)

Jax