V O R T R A G ********************** Oesterreichisches Forschungsinstitut fuer Artificial Intelligence(OFAI) der OSGK Freyung 6/6, A-1010 Wien Tel: +43-1-5336112-17, Fax: +43-1-5336112-77, Email: sec@ofai.at Bob L. Sturm, Ph.D. Associate Professor Audio Analysis Lab, AD:MT Aalborg University Copenhagen http://media.aau.dk/null_space_pursuits/ "THE CRISIS OF EVALUATION IN MIR" I critically address the "crisis of evaluation" in music information retrieval (MIR), with particular emphasis paid to music genre recognition, music mood recognition, and autotagging. I demonstrate four things: 1) many published results unknowingly use datasets with faults that render them meaningless; 2) state-of-the-art ("high classification accuracy") systems are fooled by irrelevant factors; 3) most published results are based upon an invalid evaluation design; and 4) a lot of work has unknowingly built, tuned, tested, compared and advertised "horses" instead of solutions. (The example of the horse Clever Hans provides an appropriate illustration.) I argue these problems occur because: 1) many researchers assume a dataset is a good dataset because many others use it; 2)many researchers assume evaluation that is standard in machine learning or information retrieval are useful and relevant for MIR; 3) many researchers mistake systematic, rigorous, and standardized evaluation for being scientific evaluation; and 4) problems and success criteria remain ill-defined, and thus evaluation poor, because researchers do not define appropriate use cases. I show how this "crisis of evaluation" can be addressed by formalizing evaluation in MIR to make clear its aims, parts, design, execution, interpretation, and assumptions. I also present several alternative evaluation approaches that can separate horses from solutions. ********* Time: Wednesday, 30th October 2013, 6.30 p.m. sharp Location: Oesterreichisches Forschungsinstitut fuer Artificial Intelligence, OFAI Freyung 6, Stiege 6, 1010 Wien OESTERREICHISCHES FORSCHUNGSINSTITUT FUER ARTIFICIAL INTELLIGENCE Univ.-Prof. Ing. Dr. Robert Trappl *********