Semantic Annotation by Learned Structured and Adaptive Signal Representations (SALSA)

A project sponsored by the Vienna Science and Technology Fund (WWTF)

Project lead: Monika Dörfler, NuHAG (Numerical Harmonic Analysis Group), Faculty of Mathematics, University of Vienna
Partners: Arthur Flexer, OFAI


The goal of SALSA is to bridge the semantic gap in music information research (MIR) by using adaptive and structured signal representations. The semantic gap is the difference in information content between signal representations or models used in MIR and high-level semantic descriptions used by musicians and audiences. Examples are the mapping from signal representation to concrete content such as instrumentation or to more abstract tags such as the emotional experience of music.

Recently developed methods from applied harmonic analysis allow going beyond the prevalent application of standard time-frequency analysis in MIR by using signal representations which adapt to the inherent characteristics of musical signals. Thereby it will be possible to obtain sparse representations in dictionaries of basic building blocks. The sparsity paradigm will, however, be complemented by assumptions on the representation coefficients incorporating knowledge about the structures specific to the music signals under consideration.

The central questions of SALSA are (i) whether adaptive signal representations and structured sparse coefficient estimation lead to improvement of learned mappings to high level semantic concepts and (ii) how the high level descriptions can guide the adaptation step in harmonic analysis. Answering these questions will allow for an innovative form of musical signal analysis that is informed by and adapts to the rich semantic content music has for human listeners.


SALSA members were organizing a workshop on Systematic approaches to deep learning methods for audio, 11.-15. September 2017, Erwin Schrödinger Institute, University of Vienna, Austria).