Acoustic Correlates of Speaker Sex Identification: Author: Ralph O. Coleman Implications for the Transsexual Voice
Counselors who number transsexuals
among their clients often report that the gender characteristic most resistant to
convincing change is the voice. This seems particularly true with individuals for whom the
sexual change has been that of male to female. Although it would seem that simply raising
the pitch of the voice to the range appropriate for females would be effective, the fact
is that under those circumstances a distinct male voice quality often persists (Coleman,
1976).
This seeming paradox is understandable in the context of vocal acoustics. A predominant
theory on this subject, commonly referred to as the "source-filter" theory
(Fant, 1980; Liberman, 1977), states that the voice consists of two major acoustic
components: one, the tone generated by the larynx, and two, the modulating effect on the
tone of the vocal tract acting as a series of acoustically coupled resonators.
The mechanics of the first component of this theory are well understood. In brief, the
approximated vocal folds are forcibly blown apart by the build-up of air pressure from the
larynx and a puff of air is released. The folds then re-approximate themselves through a
combination of forces including tissue elasticity, the reduction of subglottic air
pressure that occurs when the folds are blown apart, and an aerodynamic effect which draws
the folds to the midline. With the folds re-approximated, subglottic air pressure again
builds up to the point where it overcomes the resistance provided by the vocal folds, and
the process is repeated. The series of air puffs that result assume a tonal quality whose
periodicity is determined by a complex of factors, including the mass and tension of the
vocal folds and the degree of sublottic air pressure. Because the larger mass of the male
vocal folds provides greater resistance to being blown apart, the vibrating cycle is
slightly slower than in females and the lower pitch that results is a prominent perceptual
feature of the male voice.
The second acoustic component in the voice results from the action of the vocal tract on
the tone generated at the glottis. The vocal tract is a continuous tube extending from the
glottis to the lips. Because of the larger overall size of adult males, the male vocal
tract averages about 20 cm longer than that of females (Fant, 1966). In speech, the
resonating cavities in the vocal tract "shape" the glottal tone in ways
consistent with the laws of acoustic resonance. Since the male vocal tract is slightly
longer than in females, the frequencies at which vocal energy is concentrated in the male
speech signal can be expected to be somewhat lower. The magnitude of this difference, as
expressed in vowel formants, 1 is about 20% (Peterson & Barney, 1952) for an adult
population. This physiological difference has long been recognized. However, its
perceptual prominence has only recently been examined. Several studies (Brown &
Feinstein, 1977; Coleman, 1971, 1976; Lass, Hughes, Bowyer, Waters, & Bourne, 1976;
Weinberg & Bennet, 1971) indicate that under certain circumstances, vocal tract
resonances can provide a reliable cue to the sex of a speaker. In the normal, unaltered
voice, the vocal fundamental is the overriding cue to speaker sex (Coleman, 1976).
However, when the vocal fundamental frequency is equalized across sex either by whispering
or by substituting the tone generated by a laryngeal vibrator for the normal glottal tone,
the sex of the speaker can still be easily recognized (Brown & Feinstein, 1977;
Coleman, 1971; Lass, et al., 1976). It has been also demonstrated that these
identifications are based on the frequency of the vocal tract resonances as expressed by
the vowel formant frequency averages. Speakers with higher vowel formants were perceived
as being female and those with lower formants as males when the normal fundamental
frequency differences had been equalized (Coleman, 1971).
When the acoustic cues were artificially mixed, i.e., a female fundamental being modulated
by a male vocal tract and vice versa, the perception of the vocal fundamental as the cue
to the sex of the speaker was significantly weakened by the presence of vocal tract
resonances associated with the other sex (Coleman, 1976) .
The available evidence would indicate, therefore, that a female vocal fundamental
modulated by a male vocal tract can be expected to retain a degree of male quality to
which listeners are perceptually sensitive. This is, of course, the situation in which a
male who wishes to have a convincing female voice quality, finds himself.
The fundamental frequency can be raised with relative ease, but since the dimensions of
the vocal tract are fixed, males are often unable to completely eliminate the male quality
in their voice.
In view of this it is recommended that persons contemplating a male to female sex change
be informed that they may be unable to totally eliminate the male quality from their
voice. Although some male to female transsexuals may wish to maintain a degree of male
voice quality, recognition of the existence of a potentially limiting factor may result in
more realistic expectations.
1. Formants are the resonances of the vocal tract and contain the acoustic information
that conveys vowel identifiability.
2. It has been noted by Schaefer (Note 1) that male to female transsexuals from the South
find it easier to change their voice. It is Schaefer's opinion that this is secondary to
the use of a wider variety of inflectional patterns in males with Southern accents.
Reference Note
1. SCHAEFER, L. C., Personal communication, November, 1982. References
BROWN, W. S.. & FEINSTEIN, S. H. Speaker sex identification using a constant laryngeal
source. Folia Phoniatrica, 1977, 29, 240- 248.
COLEMAN. R. O. Male and female voice quality and its relationship to vowel formant
frequencies. Journal of Speech and Hearing Research, 1971, 14, 565-577.
COLEMAN, R. O. A comparison of the contributions of two voice quality characteristics to
the perception of maleness and femaleness in the voice. Journal of Speech and Hearing
Research, 1976, 19, 168-180.
FANT, G. A note on vocal tract size factors and non-uniform f- pattern scalings. Speech
Transmission Laboratory, Quarterly Progress and Status Report, 1981, 21-38.
LASS, N. J., HUGHES, K. R., BOWYER, M. D., WATERS, L. T., & BOURNE, V. T. Speaker sex
identification from voiced, whispered and filtered isolated vowels. Journal of the
Acoustical Society of America, 1976, 59, 675-678.
LIBERMAN, P. Speech physiology and acoustic phonetics: An introduction. New York:
MacMillan, 1977.
PETERSON, G. E., & BARNEY, H. L. Control methods used in a study of the identification
of vowels. Journal of the Acoustical Society of America, 1952, 24, 175-184.
WEINBERG, B., & BENNET, S. A study of talker recognition of esophageal voices. Journal
of Speech and Hearing Research 1971, 14, 391-396. Accepted for publication September 30,
1982.
Ralph O. Coleman, PhD, is an Associate Professor of Speech Pathology in the Department of
Public Health and the Crippled Children's Division of The Oregon Health Sciences
University.
Requests for reprints should be sent to Ralph O. Coleman, PhD, Crippled Children's
Division, Child Development and Rehabilitation Center, P.O. Box 574, The Oregon Health
Sciences University, Portland, OR 97207.