|
|
Feature Extraction?
Music with a duration of 5 minutes is usually represented by 13
million values. These values describe the physical properties of the acoustical waves, which we hear. When analyzing this data it is necessary to remove the irrelevant parts and emphasize the important features. The extraction of these features from the raw data is the most critical part in the process of creating a content-based
organization in a music collection. If it were possible to extract
one single feature that directly indicates which genre a piece of
music belongs to, everything else would be trivial.
Good features should be intuitively meaningful, based on
psychoacoustic findings, and robust towards variations which are
insignificant to our hearing sensation. Furthermore, they should
lead to an organization of the music collection that makes sense and not be too expensive to compute.
It is necessary to consider computational aspects because the raw data of even small
music collections easily consumes several gigabytes of storage. A
detailed analysis of all this information and all its possible
meanings would be computationally prohibitive. It thus is necessary to
reduce the amount of information to what is relevant in respect to
the overall goal, which is to organize music according to its genre. These genres
are not clearly defined and different people might assign the same
piece of music to different genres. However, there are some
attributes of the raw data, which definitely do not determine the
genre. For example, removing the first second of a piece of music
does not change its genre, but the raw data compared bit wise will
be completely different. Generally the duration of a piece of
music is not relevant. Neither does a particular melody define a genre. The same melody can be interpreted in different genre
styles just as different melodies might be members of the same
genre. Likewise, the number of instruments involved plays a minor
role in defining the genre.
One of the attributes that is rather typical for a genre is its
rhythm which is why this thesis primarily focuses on the
dynamics of music, and in particular on the fluctuation strength of the specific loudness per critical-band.
|