Acoustic Correlates of Speaker Sex Identification: Author: Ralph O. Coleman Implications for the Transsexual Voice
Counselors who number transsexuals
among their clients often report that the gender characteristic most resistant to
convincing change is the voice. This seems particularly true with individuals for whom the
sexual change has been that of male to female. Although it would seem that simply raising
the pitch of the voice to the range appropriate for females would be effective, the fact
is that under those circumstances a distinct male voice quality often persists (Coleman,
This seeming paradox is understandable in the context of vocal acoustics. A predominant theory on this subject, commonly referred to as the "source-filter" theory (Fant, 1980; Liberman, 1977), states that the voice consists of two major acoustic components: one, the tone generated by the larynx, and two, the modulating effect on the tone of the vocal tract acting as a series of acoustically coupled resonators.
The mechanics of the first component of this theory are well understood. In brief, the approximated vocal folds are forcibly blown apart by the build-up of air pressure from the larynx and a puff of air is released. The folds then re-approximate themselves through a combination of forces including tissue elasticity, the reduction of subglottic air pressure that occurs when the folds are blown apart, and an aerodynamic effect which draws the folds to the midline. With the folds re-approximated, subglottic air pressure again builds up to the point where it overcomes the resistance provided by the vocal folds, and the process is repeated. The series of air puffs that result assume a tonal quality whose periodicity is determined by a complex of factors, including the mass and tension of the vocal folds and the degree of sublottic air pressure. Because the larger mass of the male vocal folds provides greater resistance to being blown apart, the vibrating cycle is slightly slower than in females and the lower pitch that results is a prominent perceptual feature of the male voice.
The second acoustic component in the voice results from the action of the vocal tract on the tone generated at the glottis. The vocal tract is a continuous tube extending from the glottis to the lips. Because of the larger overall size of adult males, the male vocal tract averages about 20 cm longer than that of females (Fant, 1966). In speech, the resonating cavities in the vocal tract "shape" the glottal tone in ways consistent with the laws of acoustic resonance. Since the male vocal tract is slightly longer than in females, the frequencies at which vocal energy is concentrated in the male speech signal can be expected to be somewhat lower. The magnitude of this difference, as expressed in vowel formants, 1 is about 20% (Peterson & Barney, 1952) for an adult population. This physiological difference has long been recognized. However, its perceptual prominence has only recently been examined. Several studies (Brown & Feinstein, 1977; Coleman, 1971, 1976; Lass, Hughes, Bowyer, Waters, & Bourne, 1976; Weinberg & Bennet, 1971) indicate that under certain circumstances, vocal tract resonances can provide a reliable cue to the sex of a speaker. In the normal, unaltered voice, the vocal fundamental is the overriding cue to speaker sex (Coleman, 1976). However, when the vocal fundamental frequency is equalized across sex either by whispering or by substituting the tone generated by a laryngeal vibrator for the normal glottal tone, the sex of the speaker can still be easily recognized (Brown & Feinstein, 1977; Coleman, 1971; Lass, et al., 1976). It has been also demonstrated that these identifications are based on the frequency of the vocal tract resonances as expressed by the vowel formant frequency averages. Speakers with higher vowel formants were perceived as being female and those with lower formants as males when the normal fundamental frequency differences had been equalized (Coleman, 1971).
When the acoustic cues were artificially mixed, i.e., a female fundamental being modulated by a male vocal tract and vice versa, the perception of the vocal fundamental as the cue to the sex of the speaker was significantly weakened by the presence of vocal tract resonances associated with the other sex (Coleman, 1976) .
The available evidence would indicate, therefore, that a female vocal fundamental modulated by a male vocal tract can be expected to retain a degree of male quality to which listeners are perceptually sensitive. This is, of course, the situation in which a male who wishes to have a convincing female voice quality, finds himself.
The fundamental frequency can be raised with relative ease, but since the dimensions of the vocal tract are fixed, males are often unable to completely eliminate the male quality in their voice.
In view of this it is recommended that persons contemplating a male to female sex change be informed that they may be unable to totally eliminate the male quality from their voice. Although some male to female transsexuals may wish to maintain a degree of male voice quality, recognition of the existence of a potentially limiting factor may result in more realistic expectations.
1. Formants are the resonances of the vocal tract and contain the acoustic information that conveys vowel identifiability.
2. It has been noted by Schaefer (Note 1) that male to female transsexuals from the South find it easier to change their voice. It is Schaefer's opinion that this is secondary to the use of a wider variety of inflectional patterns in males with Southern accents.
1. SCHAEFER, L. C., Personal communication, November, 1982. References
BROWN, W. S.. & FEINSTEIN, S. H. Speaker sex identification using a constant laryngeal source. Folia Phoniatrica, 1977, 29, 240- 248.
COLEMAN. R. O. Male and female voice quality and its relationship to vowel formant frequencies. Journal of Speech and Hearing Research, 1971, 14, 565-577.
COLEMAN, R. O. A comparison of the contributions of two voice quality characteristics to the perception of maleness and femaleness in the voice. Journal of Speech and Hearing Research, 1976, 19, 168-180.
FANT, G. A note on vocal tract size factors and non-uniform f- pattern scalings. Speech Transmission Laboratory, Quarterly Progress and Status Report, 1981, 21-38.
LASS, N. J., HUGHES, K. R., BOWYER, M. D., WATERS, L. T., & BOURNE, V. T. Speaker sex identification from voiced, whispered and filtered isolated vowels. Journal of the Acoustical Society of America, 1976, 59, 675-678.
LIBERMAN, P. Speech physiology and acoustic phonetics: An introduction. New York: MacMillan, 1977.
PETERSON, G. E., & BARNEY, H. L. Control methods used in a study of the identification of vowels. Journal of the Acoustical Society of America, 1952, 24, 175-184.
WEINBERG, B., & BENNET, S. A study of talker recognition of esophageal voices. Journal of Speech and Hearing Research 1971, 14, 391-396. Accepted for publication September 30, 1982.
Ralph O. Coleman, PhD, is an Associate Professor of Speech Pathology in the Department of Public Health and the Crippled Children's Division of The Oregon Health Sciences University.
Requests for reprints should be sent to Ralph O. Coleman, PhD, Crippled Children's Division, Child Development and Rehabilitation Center, P.O. Box 574, The Oregon Health Sciences University, Portland, OR 97207.