auditory models and self-organising maps (Christian Spevak )

Subject: auditory models and self-organising maps
From:    Christian Spevak  <christian(at)SPEVAK.DE>
Date:    Mon, 12 Jun 2000 16:25:59 +0100

Dear list, thank you very much for your insightful replies (Peter, Martin & Jont) to my inquiry concerning the frame rate for auditory models -- they were really helpful, but raised of course new questions as well. Jim Stevenson asked me to tell more about the self-organizing map, therefore I post a more detailed description of this stage: For different genres of music it is generally not possible to arrange the multitude of different sound events in pre-defined classes, as it can be done with phonemes for speech. It is therefore tempting to use a self-organising articial neural network for the classifation of the pre-processed signals, such as he self-organising map (SOM). It has been developped by Teuvo Kohonen and was inspired by feature maps in the cerebral cortex. A SOM is able to map high-dimensional input signals on a two- or three-dimensional grid while preserving their topological relation so that similar input signals are usually mapped next to each other. The feature map thus provides a measure of similarity. In this case the input signals are constituted by a sequence of vectors from the auditory model, corresponding to a sound event. Each vector is mapped to a point on the feature map, and the whole sequence can be represented as a trajectory. Like all artifcial neural networks, a self-organising map needs to be trained to adapt its weight vectors to the distribution of input signals. The training data is presented to the network up to 100,000 times during the ordering process. The spectrum of training data should be similar to the data intended for classification to achieve the best results, although a SOM is able to generalise to a certain extent. Our goal is to largely match the classification results of the SOM with our perception, although it is not easy to define what 'perceptual simililarity' means for dynamic sounds. Thank you for any further suggestions, Christian Spevak --- Christian Spevak University of Hertfordshire, UK Music Department christian(at)

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University