Towards automatic annotation of electro-acoustic music

A project sponsored by the Austrian National Science Foundation (FWF)
Project Number: P21247

This project was about automation of annotation of electro-acoustic music through application of machine learning methods. Electro-acoustic music is a contemporary form of electronic composition and production of music that originated in the 1940s. It is made with electronic technology, using synthesized sounds or prerecorded sounds from nature and studio which are often extensively processed and altered. Compared to the analysis of instrumental or vocal music, annotation of electro-acoustic music is both more challenging and less developed. There exist no "pre-segmented" discrete units like notes, there is no score and no universally established system for analysis. Although musicology has developed various sets of tools for analysis of electro-acoustic music, the tediousness of manual annotation has prevented the application of these theories to a larger body of music. On the other hand, Music Information Retrieval has developed a rich repertoire of machine learning algorithms for analysis of music, including methods that can be used for automatic annotation. Machine learning is a subfield of artificial intelligence and is concerned with the design and development of algorithms and techniques that allow computers to learn from data (e.g. the relationship between audio representations of music and semantic descriptions). Our essential result is that machine learning methods can indeed be used for annotation of electro-acoustic music but only in an interactive setting. Only the integration of a human analyst into the workflow allows to sidestep the seeming impasse that the lack of ground in annotation of electro-acoustic music presents. Already annotations of traditional music are communal, cultural constructs in their social context rather than ground. This is even more the case for electro-acoustic music with its inquisitive nature and constant exploration and deconstruction of established musical parameters at its very heart. A human analyst in front of the computer, taking all the analytical decisions and also interpreting the output of a repertoire of machine learning algorithms is able to compensate for the lack of semantic comprehension on the side of the computer. In our project we developed two approaches to such interactive exploration into the structural and sonic nature of electro-acoustic compositions. One provides a structural overview of complete pieces of music while the other allows identification and clustering of representative sound groupings at a more detailed level. To show the potential of our methods, we also applied our interactive approaches to two renowned compositions of electro-acoustic music, namely John Chowning's "Turenas" and Denis Smalley's "Wind Chimes". This bringing together of musicological theories of electro-acoustic music and machine learning methods will help to accelerate the process of annotation as well as stabilize the results and make them more reproducible. This will in turn contribute to the theoretical coverage and practice of electro-acoustic music.


Doerfler M., Velasco G., Flexer A., Klien V.: Sparse Regression in Time-Frequency Representations of Complex Audio, Proceedings of the 7th Sound and Music Computing Conference (SMC'10), Barcelona, Spain, 2010. also available as: TR-2010-08.

Flexer A., Schnitzer D.: Album and Artist Effects for Audio Similarity at the Scale of the Web, in Proceedings of the 6th Sound and Music Computing Conference (SMC'09), Porto, Portugal, 2009. also available as: TR-2009-01.

Flexer A., Schnitzer D.: Effects of Album and Artist Filters in Audio Similarity Computed for Very Large Music Databases, Computer Music Journal, Volume 34, Number 3, pp. 20-28, 2010. also available as: TR-2010-01.

Gasser M., Flexer A.: On Computing Morphological Similarity of Audio Signals, Oesterreichisches Forschungsinstitut fuer Artificial Intelligence, Wien, TR-2010-14, 2010.

Grill T.: Re-texturing the sonic environment, Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 42-48, 2010. also available as: TR-2010-12.

Klien V., Grill T., Flexer A.: Because we are all falling down. Physics, gestures and relative realities, Proceedings of the International Computer Music Conference (ICMC'10), New York City, NY, USA, 2010. also available as: TR-2010-04.

Klien V., Grill T., Flexer A.: Towards automated annotation of acousmatic music, Proceedings of the Electronic Music Studies Network Conference 2010 (EMS'10), Shanghai, China, 2010. also available as: TR-2010-09.

Klien V., Grill T., Flexer A.: On automated annotation of acousmatic music, Oesterreichisches Forschungsinstitut fuer Artificial Intelligence, Wien, TR-2011-06, 2011.