|
Specific Loudness Sensation
Loudness belongs to the category of intensity sensations. The
loudness of a sound is measured by comparing it to a reference
sound. The 1kHz tone is a very popular reference tone in
psychoacoustics, and the loudness of the 1kHz tone at 40dB is
defined to be 1 sone. A sound perceived to be twice as
loud is defined to be 2 sone and so on.
To calculate the loudness sensation from raw audio data several transformations are necessary. The raw audio data is first decomposed into its frequencies using a discrete Fourier transformation. These frequencies are bundled according to the non-linear critical-band rate scale (bark). Then spectral masking effects are applied before the decibel values are calculated. The decibel values are transformed to equal loudness levels (phon) and finally from these the specific loudness sensation is calculated (sone). 1. Power Spectrum From the raw data the power spectrum is calculated using a FFT over a 23ms window (256 samples) weighted by a Hanning function. To increase the temporal resolution the windows are overlapped by 50%. Figure 1 depicts the characteristics of the Hanning function and the effects it has on the power spectrum.
2. Critical-Bands [Bark] The inner ear separates the frequencies, transfers, and concentrates them at certain locations along the basilar membrane. The inner ear can be regarded as a complex system of a series of band-pass filters with an asymmetrical shape of frequency response. The center frequencies of these band-pass filters are closely related to the critical-band rates. Where these bands should be centered or how wide they should be, has been analyzed throughout several psychoacoustic experiments. While we can distinguish low frequencies of up to about 500Hz well, our ability decreases above 500Hz with approximately a factor of 0.2f, where f is the frequency. This is shown in experiments using a loud tone to mask a more quiet one. At high frequencies these two tones need to be rather far apart regarding their frequencies, while at lower frequencies the quiet tone will still be noticeable at smaller distances. In addition to these masking effects the critical-bandwidth is also very closely related to just noticeable frequency variations. Within a critical-band it is difficult to notice any variations. This can be tested by presenting two tones to a listener and asking which of the two has a higher or lower frequency. Since the critical-band scale has been used very frequently, it has been assigned a unit, the bark. The name has been chosen in memory of Barkhausen, a scientist who introduced the phon to describe loudness levels for which critical-bands play an important role. Figure 2 shows the main characteristics of this scale. At low frequencies below 500Hz the critical-bands are about 100Hz wide. The width of the critical bands increases rapidly with the frequency. The 24th critical-band has a width of 3500Hz. The 9th critical-band has the center frequency of 1kHz. The critical-band rate is important for understanding many characteristics of the human ear.
A critical-band value is calculated by summing up the values of
the power spectrum within the respective lower and upper
frequency limits of the corresponding critical-band. The 128 frequency values obtained in the previous step are grouped into 20 frequency bands.
3. Masking Effects As mentioned before, the critical-bands are closely related to masking effects. Masking is the occlusion of one sound by another sound. A loud sound might mask a simultaneous sound (simultaneous masking), or a sound closely following (post-masking) or preceding (pre-masking) it. Pre-masking is usually neglected since it can only be measured during about 20ms. Post-masking, on the other hand can last longer than 100ms and ends after about a 200ms delay. Simultaneous masking occurs when the test sound and the masker are present simultaneously. For this thesis only a simple approximation of the simultaneous masking across the critical-bands was calculated. 4. Decibel Before calculating sone values it is necessary to transform the data into decibel. The intensity unit of physical audio signals is sound pressure and is measured in Pascal (Pa). The values of the PCM data correspond to the sound pressure. It is very common to transform the sound pressure into decibel (dB). Decibel is the logarithm, to the base of 10, of the ratio between two amounts of power. The decibel value of a sound is calculated as the ratio between its pressure and the pressure of the hearing threshold given by 20microPa. 5. Equal Loudness Levels [Phon] The relationship between the sound pressure level in decibel and our hearing sensation measured in sone is not linear. The perceived loudness depends on the frequency of the tone. Figure 3 shows so-called loudness levels for pure tones, which are measured in phon. The phon is defined using the 1kHz tone and the decibel scale. For example, a pure tone at any frequency with 40 phon is as loud as a pure tone with 40dB at 1kHz. We are most sensitive to frequencies around 2kHz to 5kHz. The hearing threshold rapidly rises around the lower and upper frequency limits, which are respectively about 20Hz and 16kHz.
6. Specific Loudness Sensation [Sone]
The relationship between phon and sone can be seen in Figure 4. For low values up to 40 phon the sensation rises slowly until it reaches 1 sone at 40 phon. Beyond 40 phon the sensation increases at a faster rate.
7. Examples
The Figures 5 and 6 illustrate the preprocessing steps so far. A detailed description of what can be seen in these Figures can be found in the thesis.
|