Re: Classifying music samples (MP3/WAV/...)

Marc van Woerkom (van.woerkom nospam at netcologne.de)
Thu, 23 Sep 1999 11:43:27 +0200 (CEST)

> It seems to me that there has to be a better way to classify music
> than to rely on a CD TOC. How hard would it be to perform
> some sound analysis of the underlying waveform, extract some
> fundemental characteristics and then use those as the key
> to a database? Granted, it's not going to be very simple, but
> is a lot simpler than, say, voice recognition.
> For example,
> Record the time of the first N pitch/power/volume peaks.
>
> Surely someone must have done something like this already?

Something like this.

I worked two years on a database system for spectral and structural data
from the domain of physical chemistry (IR/UV/Vis./mass/C-NMR, H-NMR,
X-NMR spectra, chemical structures).

The spectra came in two flavours continous (= many points sampled) or
peak spectra (= reduced to samples at the peaks that were identified
due to some algorithm - using Savitzky-Golay and such).

The hard thing is to have such a system working for a realistic database
of 500.000 and more entries in a reasonable time, space and quality.

It is very easy to become a victim of combinatorical explosion.

If you can't make assumptions on the structure of your data,
you are fsck-ed and have to touch every piece of data during
a search. Then you rely on an effecitve pass through of data
out of the database (where relational DBs suck IMHO)

If you can make assumptions, like in the case of structural
data, you can have huge speed improvements, but your algorithms
are non trivial (for structures you would have to be good in
graph theory).

In the case of acoustic data, one will have to agree first
on how to reduce the data to smaller, characteristic set
of data (e.g. the dominant scontributions of a Fourier or
Wavelet decomposition of the signal), a metric d(s1, s2)
that defines the distance between to signals and then
have the same problems like I had with the spectroscopic
system (how to store/retrieve/search that data efficiently).

What we use is data (positional data of the tracks) that
is gathered quick and relatively easy accross platforms.

The next step will be harder. Because for what you have in
mind, we would need a good low level/ripper library
(= transforming audio data from a CD into a wav file) that
must be available on many platforms, for many devices
(IDE and SCSI) to set a standard.

Right now only the cdda2wav library from Jörg Schilling
qualifies for this purpose. I put some hopes on for
paranoia IV as well.

I am sure that such a system can be built in the free
software scene, processing power and storage space
has been becoming cheap enough or will become cheap
enough soon.

If you want to work on this, I can give you some hints,
but I myself have to do some (more trivial) homework yet
before tackling something like this.

Regards,
Marc