November 5, 2013:
NOTE: An updated software package for hubness analysis is
available at our project homepage: http://ofai.at/research/impml/projects/hubology.html
This is the main evaluation script to re-run the whole evaluation of the work submitted to JMLR. Matlab is needed to run the scripts
Download mp_scripts-v2.zip (72MB)
The following datasets are included in the download
To run extract the files in ma_scripts.zip. Start Matlab and use eval_mld('*') to start the evaluation. Note that this takes about a day to complete. If the script is called with the second parameter set to true, eval_mld('*', 1) the (heavy to compute) Goodman-Kruskal Index will be included in computation.
To evaluate a single database use desired collection as a parameter: eval_mld('corel-corel1000.db');
Then the Matlab output will look like:
Collection: corel1000 (n=1000) size: 1000, classes: 10, dim: 192, intrinsic dim: 9 Original (l_2) - S^{k=1}: 1.83, C^{k=1}: 70.7% S^{k=5}: 1.45, C^{k=5}: 65.2% S^{k=20}: 1.52, C^{k=20}: 63.9% SYMM^{k=5}: 35.8%, SYMM^{k=10%}: 42.1% NICDM - S^{k=1}: 1.00, C^{k=1}: 72.9% S^{k=5}: 0.39, C^{k=5}: 72.0% S^{k=20}: 0.63, C^{k=20}: 72.3% SYMM^{k=5}: 69.8%, SYMM^{k=10%}: 70.0% MP (Empiric) - S^{k=1}: 0.83, C^{k=1}: 71.6% S^{k=5}: 0.31, C^{k=5}: 70.3% S^{k=20}: 0.05, C^{k=20}: 69.0% SYMM^{k=5}: 64.0%, SYMM^{k=10%}: 69.2%
As in the paper, S^{k=5} refers to the hubness, C^{k=1,5} to the classification accuracies. SYMM^{k=5,10%} to the percentage of symmetric nearest neighbor relations.
The Mutal Proximity function is called norm_mp_empiric() (in file norm/norm_mp_empiric.m) and can be used with any distance matrix.
Implemented variants of MP are:
Dominik Schnitzer, Last Update: July 31, 2012