two ideas

A. Lester Buck III (buck nospam at compact.com)
Mon, 15 Mar 1999 03:59:54 -0600

I have been following the discussion about the CDDB database
being turned private for the past few days. I first learned
about it through a reference on the slink-e mailing list,
a list for discussing details about home automation with the
Slink-e controller. Slink-e is able to read enough track timing
information through the Sony S-Link interface to calculate the
CDDB disk id, even though the Sony S-Link changers are only
CD-Audio drives. I would hate to see the GPL replacement for
CDDB require leadin and similar data available only to CD-ROM
drives reading the CD-Audio disks.

That said, I have two general ideas that you might find of
interest.

First, how do we "recapture" the existing CDDB data? It was
donated by individuals and the current license terms are trying
to shut down any other access to competing CD disk databases.
The license that most of the CDDB client developers will sign does
not allow them to access other CD databases. The opportunity is
that the *users* sign nothing, and the software runs on millions
of their machines. Therefore, we enlist the users of the CDDB
client software application to help us.

How do we do that? We ask users to instrument their machines for
us. As the recent progress of the Happy99.exe worm demonstrated
in a dramatic way, it is possible to patch WSOCK32.DLL and
monitor the protocol being transmitted on a TCP connection.
Happy99.exe traps NNTP and SMTP. We would write a version that
traps the CDDB protocol inbound and outbound packets and sends
the same traffic to open GPL servers. This application wouldn't
really be an uninvited worm, but a software shim that consenting
users voluntarily apply to tap into the Win32 access points and
transparently capture the appropriate traffic. With millions
of CDDB users, a relatively small fraction of public spirited
open software advocates, say 0.1% of the user population, might
easily capture 99+% of all the CDDB entries. This software shim
would sit on download sites right next to the CDDB clients that
are controlled by oppressive licensing restrictions. Within a
few months, this transparently extracted data should be plenty
to make the data a match for the original databases. Are there
any public statistics for the access distribution of CD data?
With some real data, we could predict how many user's machines
would have to be instrumented for how long to capture a given
fraction of the CDDB database.

For some background on the Happy99 virus, see

http://www.symantec.com/avcenter/venc/data/happy99.worm.html

The second problem is how to access the CD database. This name is
somewhat misleading, since it is a database but more specifically,
it is a directory. Here is what we mean by a directory:

http://www.umich.edu/~dirsvcs/ldap/doc/guides/slapd/1.html#RTFToC1

1.1 What is a directory service?

A directory is like a database, but tends to contain more
descriptive, attribute-based information. The information
in a directory is generally read much more often than it is
written. As a consequence, directories don't usually implement
the complicated transaction or roll-back schemes regular
databases use for doing high-volume complex updates. Directory
updates are typically simple all-or-nothing changes,
if they are allowed at all. Directories are tuned to give
quick-response to high-volume lookup or search operations. They
may have the ability to replicate information widely in order
to increase availability and reliability, while reducing
response time. When directory information is replicated,
temporary inconsistencies between the replicas may be OK,
as long as they get in sync eventually.

There are many different ways to provide a directory
service. Different methods allow different kinds of information
to be stored in the directory, place different requirements
on how that information can be referenced, queried and
updated, how it is protected from unauthorized access,
etc. Some directory services are local, providing service to
a restricted context (e.g., the finger service on a single
machine). Other services are global, providing service to
a much broader context (e.g., the entire Internet). Global
services are usually distributed, meaning that the data they
contain is spread across many machines, all of which cooperate
to provide the directory service. Typically a global service
defines a uniform namespace which gives the same view of the
data no matter where you are in relation to the data itself.

A CDDB replacement is obviously a directory service.
The international standard for directory services is X.500, and
X.500 was architected to be the white pages for the world, storing
phone numbers, email addresses, and all sorts of other attribute
information for potentially billions of people. The current
popular subset of X.500 is called Lightweight Directory Access
Protocol, or LDAP. You may have an LDAP client built into your
browser, as all recent versions of Netscape have an LDAP client
built into the address book for searching various email directory
services. Microsoft's Active Directory coming in Windows 2000 is
built around LDAP. The University of Michigan has an open source
implementation of LDAP, and I had that distribution installed,
compiled, and executing simple name queries in under an hour
on UNIX. Netscape has a free download to test their Directory
Server, and a full featured toolkit for writing applications.
IBM includes LDAP as a fundamental component in their e-commerce
server, including a full toolkit. LDAP servers are currently
reasonably common, and they will be everywhere within a year.

LDAP specifies powerful search features, all the necessary
protocols to run over TCP/IP, and a way to make search requests
over HTTP by appropriately constructed URLs.

To bring up a CDDB replacement on LDAP requires only defining a
set of new schema entries in the configuration language of the
given implementation and installing the schema in the server.
Sample code for LDAP clients is everywhere, from the mozilla.org
code base to the Michigan distribution to Netscape's directory
SDK toolkit to IBM's developer toolkit.

Here are some links for more information:

http://www.umich.edu/~dirsvcs/ldap/index.html
http://www.kingsmountain.com/ldapRoadmap.shtml
http://www3.innosoft.com/ldapworld/ldapfaq.html
http://www.stanford.edu/%7Ehodges/talks/EMA98-DirectoryServicesRollout/Steve_Kille/index.htm

Well, those are my two ideas for a CDDB replacement: a user
installed shim to tap protocol packets from existing CDDB
applications, and an LDAP schema for the CD data elements.

"Information wants to be free."

Best regards,

Lester

--
A. Lester Buck		buck nospam at compact.com