On High Dimensional Data Analysis in Music Information Retrieval

A project sponsored by the Austrian National Science Foundation (FWF)
Project Number: P27082

Learning in high dimensional spaces poses a number of challenges which are referred to as the curse of dimensionality. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, is very often relying on high dimensional feature representations and models. The existence of a new aspect of the curse of dimensionality, the so-called hubness, has been first documented and established in MIR as a problem of computing music similarity. Hub songs are, according to the music similarity function, similar to very many other songs and as a consequence appear in very many recommendation lists preventing other songs from being recommended at all. The hubness phenomenon has since then been identified as a general problem of machine learning in high dimensional spaces. It is due to the property of distance concentration which causes all points in a high dimensional data space to be at almost the same distance to each other.

Our own previous research efforts have focused on the impact of distance concentration and hubness on nearest neighbor based music recommendation and genre classification. As a result we have developed a general unsupervised method to pre-process and rescale distance spaces which is able to decisively diminish hubness and its adverse effects in music databases but also general machine learning datasets. Research by our own and other research groups has also made it clear that concentration and hubness have an impact on many more distance based algorithms being used in high dimensional data analysis. This proposed project will explore existing and develop new approaches to deal with these problems by studying their effects on a wide range of methods in MIR, but also multimedia and machine learning. In particular we are planning to (i) study and unify rescaling methods to avoid distance concentration, (ii) explore the role of hubness in unsupervised (clustering, visualization) and supervised learning (classification) in high dimensional spaces.

The main focus of this project is on MIR since this is where the majority of results on hubness and concentration exist. But the evaluation of our results in the broader field of multimedia and machine learning will make sure that our research has the potential to solve an important problem in MIR and at the same time a general problem of learning in high dimensional spaces.


Feldbauer R., Flexer A.: Centering versus Scaling for Hubness Reduction, in Proceedings of the 25th International Conference on Artificial Neural Networks (ICANN'16), Part I, pp. 175-183, Springer International Publishing, 2016. also available as: TR-2016-05.

Flexer A.: Improving visualization of high-dimensional music similarity spaces, 16th International Society for Music Information Retrieval Conference, Malaga, Spain, 2015. also available as: TR-2015-03.

Flexer A.: The impact of hubness on music recommendation, Machine Learning for Music Discovery Workshop at the 32nd International Conference on Machine Learning, Lille, France, 2015. also available as: TR-2015-02.

Flexer A.: On inter-rater agreement in audio music similarity, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR'14), Taipei, Taiwan, 2014. also available as: TR-2014-06.

Flexer A., Grill T.: The Problem of Limited Inter-rater Agreement in Modelling Music Similarity, Journal of New Music Research, 2016 (published online on 5th of July 2016). DOI: http://dx.doi.org/10.1080/09298215.2016.1200631

Flexer A. and Schnitzer D.: Choosing lp norms in high-dimensional spaces based on hub analysis, Neurocomputing, Volume 169, pp. 281-287, 2015. DOI: http://dx.doi.org/10.1016/j.neucom.2014.11.084

Flexer A., Stevens J.: Mutual proximity graphs for music recommendation, Proceedings of the 9th International Workshop on Machine Learning and Music, Riva del Garda, Italy, 2016. also available as: TR-2016-06.

Schnitzer D., Flexer A.: The Unbalancing Effect of Hubs on K-medoids Clustering in High-Dimensional Spaces, Proceedings of the International Joint Conference on Neural Networks, Killarney, Ireland, 2015. also available as: TR-2015-01.


Matlab implementations of algorithms developed in a previous project are published as the HUB Toolbox. The full toolbox is available as a .zip file:

The toolbox includes Matlab code for Mutual Proximity (with different distance distribution assumptions), Local Scaling and Shared Nearest Neighbors. Evaluation functions include hubness analysis functions, the Goodman-Kruskal index, KNN evaluation functions and a function to estimate the intrinsic dimensionality. Please refer to the README and the corresponding publication for a detailed description.

Previous Research on Hubness

Please see the information on our previous project on “Preventing Hubness in Music Information Retrieval”.

Additional sponsoring

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.