SALSA
Semantic Annotation by Learned Structured and Adaptive Signal Representations

The goal of SALSA is to bridge the semantic gap in music information research (MIR) by using adaptive and structured signal representations. The semantic gap is the difference in information content between signal representations or models used in MIR and high-level semantic descriptions used by musicians and audiences. Examples are the mapping from signal representation to concrete content such as instrumentation or to more abstract tags such as the emotional experience of music.

Recently developed methods from applied harmonic analysis allow going beyond the prevalent application of standard time-frequency analysis in MIR by using signal representations which adapt to the inherent characteristics of musical signals. Thereby it will be possible to obtain sparse representations in dictionaries of basic building blocks. The sparsity paradigm will, however, be complemented by assumptions on the representation coefficients incorporating knowledge about the structures specific to the music signals under consideration.

The central questions of SALSA are (i) whether adaptive signal representations and structured sparse coefficient estimation lead to improvement of learned mappings to high level semantic concepts and (ii) how the high level descriptions can guide the adaptation step in harmonic analysis. Answering these questions will allow for an innovative form of musical signal analysis that is informed by and adapts to the rich semantic content music has for human listeners.

Publications

  • Bammer R., Dörfler M.: Invariance and Stability of Gabor Scattering for Music Signals, Proceedings of SAMPTA, 2017.
  • Bammer R., Dörfler M.: Modifying Signals in Transform Domain: a Frame-Based Inverse Problem, in Proceedings of the 19th International Conference on Digital Audio Effects (DAFx-16), 2016.
  • Bammer R., Dörfler M., Harar P.: Gabor Frames and Deep Scattering Networks in Audio Processing, Axioms 8, 4 (09), 2019.
  • Breger A., Orlando J., Harar P., Dörfler M., Klimscha S., Grechenig C., Gerendas B., Schmidt-Erfurth U., Ehler M.: On Orthogonal Projections for Dimension Reduction and Applications in Variational Loss Function for Learning Problems, Journal of Mathematical Imaging and Vision (Oct), 2019.
  • Cordero E. De Gosson M., Dörfler M., Nicola F.: On the symplectic covariance and interferences of time-frequency distributions, SIAM Journal on Mathematical Analysis (SIMA), 50(2), pp. 2178-2193, 2018. DOI: https://doi.org/10.1137/16M1104615
  • Dörfler M.: Learning how to Listen: Time-Frequency Analysis meets Convolutional Neural Networks, Internationale Mathematische Nachrichten 1 (March), 2019.
  • Dörfler M., Bammer R., Breger A., Harar P., Smekal Z.: Improving Machine Hearing on Limited Data Sets, in Proceedings of ICUMT, 2019.
  • Dörfler M., Bammer R., Grill T.: Inside the Spectrogram: Convolutional Neural Networks in Audio Processing, Proceedings of SAMPTA, 2017.
  • Dörfler M., Faulhuber M.: Multi-Window Weaving Frames, Proceedings of SAMPTA, 2017.
  • Dörfler M., Grill T., Bammer R., Flexer A.: Basic Filters for Convolutional Neural Networks: Training or Design?, Neural Computing and Applications, first online: 24 September, 2018. also available as https://arxiv.org/abs/1709.02291
  • Dörfler M., Velasco, G.: Sampling time-frequency localized functions and constructing localized time-frequency frame, European J. Appl. Math, Volume 28, Issue 5, pp. 854-876, 2017. DOI: https://doi.org/10.1017/S095679251600053X
  • Flexer A., Dörfler M., Schlüter J., Grill T.: Hubness as a case of technical algorithmic bias in music recommendation, in Proceedings of 6th International Workshop on High Dimensional Data Mining (HDM), in conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2018), Singapore, 2018. also available as: TR-2018-04.
  • Flexer A., Dörfler M., Schlüter J., Grill T.: Technical algorithmic bias in a music recommender (extended abstract), Late Breaking / Demos Session, 19th International Society for Music Information Retrieval Conference, 2018. also available as: TR-2018-03.
  • Flexer A., Grill T.: The Problem of Limited Inter-rater Agreement in Modelling Music Similarity, Journal of New Music Research, Vol. 45, No. 3, pp. 239-251, 2016. DOI: http://dx.doi.org/10.1080/09298215.2016.1200631
  • Flexer A., Lallai T.: Can We Increase Inter- and Intra-Rater Agreement in Modeling General Music Similarity?, in Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019. also available as: TR-2019-01.
  • Gkiokas A., Lattner S., Katsouros V., Flexer A., Carayanni G.: Towards an Invertible Rhythm Representation, Proceedings of the 18th International Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, 2015.
  • Grill T., Schlüter J.: Two Convolutional Neural Networks for Bird Detection in Audio Signals, Proceedings of the 25th European Signal Processing Conference (EUSIPCO), 2017.
  • Grill T., Schlüter J.: Music boundary detection using neural networks on combined features and two-level annotations, in Proceedings of the 16th International Society for Music Information Retrieval Conference, pp. 531-537, 2015.
  • Grill T., Schlüter J.: Music boundary detection using neural networks on spectrograms and self-similarity lag matrices, in Proceedings of the 23rd European Signal Processing Conference (EUSIPCO 2015), pp. 1306-1310, Nice, France, 2015.
  • Holzapfel A., Benetos E.: The Sousta Corpus: Beat-Informed Automatic Transcription of Traditional Dance Tunes, Proceedings of the 17th International Society for Music Information Retrieval Conference, 2016.
  • Holzapfel A., Grill T.: Bayesian Meter Tracking on Learned Signal Representations, Proceedings of the 17th International Society for Music Information Retrieval Conference, 2016.
  • Lattner S., Dörfler M., Arzt, A.: Learning complex basis functions for invariant representations of audio, in Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019.
  • Ottosen E., Dörfler M.: A Phase Vocoder based on Nonstationary Gabor Frames, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 25, Issue: 11, pp. 2199-2208, 2017. DOI: https://doi.org/10.1109/TASLP.2017.2750767
  • Schlüter J., Grill T.: Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks, in Proceedings of the 16th International Society for Music Information Retrieval Conference, 2015.
  • Srinivasamurthy A., Holzapfel A., Cemgil A. T., Serra X.: A generalized Bayesian model for tracking long metrical cycles in acoustic music signals, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016.

Research staff

Partners

Sponsor

Key facts