Khoomei Acoustic Analysis

Khoomei is a form of throat singing used in traditional music from Tuva. In fact, khoomei is Mongolian for the word "throat." What makes throat singing amazing is that a single person can generate multiple voices at one time.

The baudline spectrum analyzer was used to analyze track 6 from Huun-Huur-Tu's CD "60 Horses in My Herd." A sample of the audio file (kokhoomei.mp3) can be found here:

The baudline spectrogram image below is a 10 second cut that displays 16kHz of bandwidth. The purple and green colors represent the left and right channels.

Four distinct voices are visible amongst the rich harmonic structure. A constant drone, the fundamental at 109 Hz is the foundation. This sound is made from the throat and the third harmonic is about 10 dB louder than the fundamental. The second voice looks like stair steps and is in the 700 to 1500 Hz range. The third voice looks a lot like a mirror image of the second voice and is in the 2500 to 3000 Hz range. The forth voice is in the 3300 to 3500 Hz range. The complex and interesting shapes above 4000 Hz are nasal and breathing sounds.

The next spectrogram is the same 10 second cut as above but the frequency axis has been zoomed in to display 4kHz of bandwidth. This zoomed view shows better detail of the harmonic structure.

Two features are very interesting in this spectrogram.

1) The frequencies of the second voice (700 - 1500 Hz) are being limited to values that are harmonics. This means that every note is a multiple of the fundamental and that frequencies in between are not possible. This observation hints at how this unique sound is being produced. Parts of the spectrum are being attenuated or amplified by a cavity resonance like filter process.

2) The third voice (2500 -3000 Hz) looks like a mirror image of the second voice. When the seconds voice goes up the third voice goes down and vice versa. It is not an exact mirror image but 1900 Hz looks like the center pivot frequency. Non-linear processes such as decimating or sample rate conversion without filtering create a similar form of aliasing. Exactly how this relates to the mechanics at work in the vocal tract is unknown.


