Re: Classifying music samples (MP3/WAV/...)

Johan Pouwelse (pouwelse nospam at pds.twi.tudelft.nl)
Thu, 23 Sep 1999 10:18:01 +0200 (MET DST)

A very nice idea, this is worth exploring!

> > But a hash of the first few bars of
> > each song on the CD should do the trick.
> But it wouldn't since every MP3 converter, for example, would produce
> a different .mp3 and that in turn when decoded would produce a different
> raw file. That's why some sound analysis would have to be done to
> properly classify a random .mp3, .wav, .au, or the raw input from a
> CD, which itself could be corrupted slightly without an audible
> indication.
The MP3 perceptual model does some frequency masking and temporal masking.
The output of the different encoders remains largely the same to the human
auditory system. Some music properties remain the same with heavy
compression of 32 kbps.

After the subband filtering and quantization step it is still possible to
compose a 1024 bit value out of every track on a CD.
But processing the whole content of the audio CD before a match could be
made is a process that takes too long, even on 40 speed CD-ROMs.

A practical version of this nice idea would read the first few seconds of
every audio CD track, and do some real time transformation to, for example
a 1024 bit value. This would yield (128 bytes/track, 15 tracks/cd) a
resonable size.

What you can essentialy can do with the real time transformation is
compress the first seconds of a track to 1024 bits!!! I don't think signal
metrix like peak signal strength, average peak-to-peak signal strength
ratio are helpfull in this. A simple quanitisation of the energy
distribution in the frequency spectrum would probably do the trick. A time
resolution of 100 miliseconds and a frequency resolution of 500Hz would be
good. I only did a single cource in digital signal processing and audio
compression, perhaps i'm wrong about this. Who can take this concept
further?

Johan.