Is this using anything approaching a full data set in size, or an
approach to lookups that would apply to such a data set? That is,
will this number remain the same when you have over 100k entries? Or
is this just returning some small number of memory-resident records
over and over? As it turns out, this performance looks like it is
more for performance sake rather than any real need for it.
If you look at the stats page at
http://www.cddb.com/hits/stats.html, for the 'Worldwide 30 Day
Activity Report', you will see that in the last 30 days, cddb has
served only 3476858 entries. This amounts to 1.34 hits/second
(averaged over the whole day -- 4 hits/second would probably be a
reasonable peak load during the middle of the day). That being the
case, we don't need anything that even approaches a bazillion queries
a second. Rather, we just need something that is easy to implement
and use. You could probably match this using gophers and a pile of
notepads. Certainly Sybase running on our Linux box could serve this
many transactions.
cddb.com could be underreporting their hits, I suppose...
I agree that a full RDBMS approach is probably more than some
people would want to bite off (even though the implementation would
be easy, there may not be many willing to dedicate their
Sybase/Oracle/whatever server to the job), but we should keep the
store of data in such a format that it can be dumped into a database
w/o having to do much work. This probably means a structured format
like XML or possibly something simpler of our own design.
-tim
(BTW, looking at the number of new entries, they are currently
averaging 519 new entries per day. We don't need massive amounts of
data distribution power either.)