November 5, 2013:**NOTE:** An updated software package for hubness analysis is
available at our project homepage: http://ofai.at/research/impml/projects/hubology.html

This is the main evaluation script to re-run the whole evaluation of the work submitted to JMLR. Matlab is needed to run the scripts

Download mp_scripts-v2.zip (72MB)

The following datasets are included in the download

*corel-corel1000.db, cp-c1ka-twitter.db, cp-c224a-web.db, kr-amlall.db, kr-lungcancer.db, kr-ovarian-61902.db, libsvm-australian.db, libsvm-breast-cancer (sc).db, libsvm-colon-cancer.db, libsvm-diabetes (sc).db, libsvm-duke (train).db, libsvm-fourclass (sc).db, libsvm-ger.num (sc).db, libsvm-heart (sc).db, libsvm-ionosphere (sc).db, libsvm-liver-disorders (sc).db, libsvm-sonar (sc).db, libsvm-splice (sc).db, mirex-ballroom.db, mirex-ismir2004.db, pabo-movie-reviews.db, uci-arcene.db, uci-dexter.db, uci-dorothea.db, uci-gisette.db, uci-mfeat-factors.db, uci-mfeat-karhunen.db, uci-mfeat-pixels.db, uci-mini-newsgroups.db, uci-reuters-transcribed.db*

To run extract the files in ma_scripts.zip. Start Matlab and
use `eval_mld('*')` to start
the evaluation. Note that this takes about a day to complete. If the
script is called with the second parameter set to true,
`eval_mld('*', 1)` the (heavy to compute) Goodman-Kruskal
Index will be included in computation.

To evaluate a single database use desired collection as a
parameter: `eval_mld('corel-corel1000.db');`

Then the Matlab output will look like:

Collection: corel1000 (n=1000) size: 1000, classes: 10, dim: 192, intrinsic dim: 9 Original (l_2) - S^{k=1}: 1.83, C^{k=1}: 70.7% S^{k=5}: 1.45, C^{k=5}: 65.2% S^{k=20}: 1.52, C^{k=20}: 63.9% SYMM^{k=5}: 35.8%, SYMM^{k=10%}: 42.1% NICDM - S^{k=1}: 1.00, C^{k=1}: 72.9% S^{k=5}: 0.39, C^{k=5}: 72.0% S^{k=20}: 0.63, C^{k=20}: 72.3% SYMM^{k=5}: 69.8%, SYMM^{k=10%}: 70.0% MP (Empiric) - S^{k=1}: 0.83, C^{k=1}: 71.6% S^{k=5}: 0.31, C^{k=5}: 70.3% S^{k=20}: 0.05, C^{k=20}: 69.0% SYMM^{k=5}: 64.0%, SYMM^{k=10%}: 69.2%

As in the paper, *S^{k=5}* refers to the hubness,
*C^{k=1,5}* to the classification accuracies.
*SYMM^{k=5,10%}* to the percentage of symmetric nearest neighbor
relations.

The Mutal Proximity function is called `norm_mp_empiric()`
(in file `norm/norm_mp_empiric.m`) and can be used with any
distance matrix.

Implemented variants of MP are:

- Empiric (full):
`norm_mp_empiric.m` - Gauss (full):
`norm_mp_gauss.m` - Gauss (independence):
`norm_mp_gaussi.m` - Gamma (independence):
`norm_mp_gammai.m`

*Dominik Schnitzer, Last Update: July 31, 2012*