From FEMINET, Felton CA 408-335-4387 or 408-335-7888
Changing the Vocal Characteristics of a Postoperative
Transsexual Patient: A Longitudinal Study
Kay H. Mount and Shirley J. Salmon
Audiology and Speech Pathology Service,
Veterans Administration Medical Center,
Kansas City, Missouri
The vocal characteristics of a 63-year-old individual who underwent male-
to-female sex reassignment surgery were evaluated. Treatment was designed to
alter inappropriate male voice characteristics. Speech goals were to (1)
encourage use of successively higher pitch levels, and (2) modify tongue
carriage to change resonance. After 11 months of therapy, average fundamental
frequency for /i, a, u/ vowels changed from 110 to 205 Hz. Also, second
formant frequency values changed remarkably for each of these vowels, with the
greatest frequency change being 291 Hz for /i/. These acoustic differences
could account for the perception of femininity in her posttreatment voice.
Maintenance of these acoustic features was found five years posttreatment.
Address correspondence to Kay H. Mount, Ph.D. Audiology and Speech
Pathology Service (126), Kansas City VA Medical Center, 4801 Linwood Blvd.,
Kansas City, MO 64128.
INTRODUCTION
Because of differences in laryngeal size and mass, average fundamental
frequency (F0) for females is higher (220 Hz) than for males (110 Hz). The
perceived pitch of the laryngeal fundamental has long been accepted as an
acoustic cue to speaker sex. Thus, for male-to-female transsexuals,
fundamental frequency must change for the perception of a female voice. Baker
and Green (1970) report that a common misconception about male-to-female sex
reassignment surgery is that castration or use of estrogen will raise vocal
pitch. That surgery or use of estrogen has little effect on vocal pitch was
demonstrated by Wolfe, Ratusnik, and Northrop (1980). They investigated vocal
characteristics of 20 male-to-female transsexuals all of whom had been on
hormone treatment for various amounts of time, but only one of whom had
undergone surgery. Reportedly, "mean fundamental frequency of transsexual
speakers (93-202 Hz) covered a broad range, overlapping those of male and
female speakers" (p. 473). These data demonstrated that this group of
patients exhibited fundamental frequencies that might be expected from the
normal male population. Their fundamental frequencies did not skew to the
right as might be predicted if fundamental frequency was affected by either
surgery or estrogen.
Those who have treated male-to-female transsexuals have recognized the
importance of changing fundamental frequency. Bralley, Bull, Gore, and
Edgerton (1978) as well as Kalra (1977) presented data that illustrated how
one male-to-female transsexual elevated fundamental frequency following vocal
rehabilitation. The individual described by Bralley et al. (1978) elevated
fundamental frequency from 145 to 165 Hz. Although the voice was higher in
pitch and judged more feminine, the investigators reported that it could still
be distinguished from female voices. These findings suggest that alteration
of fundamental frequency alone is not enough to achieve perception of feminine
gender. Kalra's patient raised fundamental frequency from 168 to 200 Hz and
used Froeschel's chewing method (1952) in an attempt to increase anterior oral
resonance. Kalra recognized the importance of accentuating anterior oral
resonance to accommodate the newly acquired higher pitch; however, he did not
provide data to substantiate change in vocal tract resonance.
Other investigators also have recognized the need to alter vocal parameters
besides fundamental frequency. Wolfe, Ratusnik, and Northrop (1980) reported
high negative correlations between ratings on a femininity-masulinity scale
and means of five vocal characteristics (F0 formant frequency, extent of
upward inflections, extent of downward inflections, and extent of both upward
and downward inflections). These findings as well as similar observations
reported by Pronovost (1942), Snidecor (1951), and Coleman (1971, 1976)
provide further support for the notion that other acoustic parameters in
addition to frequency influence gender identification.
Vocal tract resonance characteristics may be the second most important
acoustic cue to speaker identification. Peterson and Barney (1952) and
Ladefoged and Broadbent (1957) found that females have higher average vowel
formant frequencies than males. The importance of vocal tract resonances as a
cue to speaker sex identification was shown by Coleman (1971). He reported
that listeners correctly identified speaker sex 88% of the time when listening
to both sexes produce artificial larynx speech having a fundamental of 85 Hz.
In 1976 Coleman investigated further the importance of vocal tract resonance
and fundamental frequency on gender identification. In one experiment, male
and female speakers produced speech samples using normal voice and in another
they used an artificial larynx. When speakers used normal voice, vocal tract
resonance and fundamental frequency were both important to male-female
identification. When they used artificial voice and when vocal tract
resonance characteristics of one sex were combined with F0 characteristics of
the opposite sex, listeners generally identified the speaker as male. This
was true whether a male F0 was combined with female vocal tract resonance or
whether a female F0 was combined with male vocal tract resonance.
These cumulative findings lead to the hypotheses that for male-to-female
transsexuals (1) raising the F0 alone will likely result in perception of male
voice, and (2) raising the F0 and the vocal tract resonance simultaneously
will likely result in the perception of female voice. To test these
hypotheses, a treatment plan was devised that focused on changing both the
laryngeal tone and its resonance. Consequent frequencies were measured.
Successful procedures for increasing fundamental frequency of male-to-
female transsexuals' voices were well established; but procedures for changing
vocal tract resonance were not. Yet, resonance of the vocal tract can be
altered at will and is demonstrated during vowel production. According to
Fant (1956), vowels are primarily the product of the voice source and the
filtering action of the vocal tract. The resonant frequencies of the vocal
tract, F1, F2, etc., are determined by characteristics of the tract including
size and shape (Delattre, 1951; Fant, 1962). Narrowing of the oral-pharyngeal
cavities during production of a high front vowel such as /i/ is produced by
elevation of the mandible, which results in low F, values, and anterior tongue
carriage, which results in high-frequency resonance for F2. Resonance
produced with excessive anterior tongue carriage has been referred to as
"thin" by Boone (1971) and Fisher (1975). Here, "thin vocal tract resonance"
and "upward movement of the second vowel formant frequencies" are used
synonymously and are thought to be the result of anterior tongue carriage.
Thus, upward and downward movements of F2 were chosen to represent change in
vocal tract resonance for this study. In therapy, the patient's natural
abilities were used to produce voluntarily a higher laryngeal fundamental and
to enhance it by forward carriage of the tongue, thereby raising second
formant vowel frequency. Pre- and posttreatment frequency measurements of the
fundamental and the vowel second formant were made to quantify treatment
results.
The purpose of this report is to describe the treatment provided to a
postoperative male-to-female transsexual, present acoustic data from
pretreatment and posttreatment voice samples, and speculate about the
relationship between acoustic change and perception of feminine voice.
PROCEDURES
Equipment
Equipment used during diagnosis and treatment included (1) Kay Elemetrics
Visi-pitch, model 6087A, attached to a Tektronix oscilloscope, model 5113; (2)
Voice Identification 700 Series sound spectrograph with attached full track
reel-to-reel Crown tape recorder, model IM7; (3) portable Sony cassette tape
recorder; (4) Philco minicassette recorder; (5) Bell and Howell Language
Master; and (6) Sony U-matic video recorder and camera. The Visi-pitch was
used to assess fundamental frequency and relative intensity of utterances and
to provide visual feedback regarding these parameters. A Language Master and
tape recorders were used to provide auditory feedback, while the video
recorder was used to provide both auditory and visual feedback. The sound
spectrograph was used for acoustic analysis of selected speech samples over
time.
Evaluation
The patient was a 63-year-old individual who began hormone treatment a year
before undergoing male-to-female reassignment surgery. Six months after
surgery she came to the speech clinic for evaluation. Her chief complaint was
a low-pitched voice, which was not perceived as female, especially over the
telephone.
The patient was audio- and video-recorded throughout the diagnostic
session. During conversational speech and production of sustained vowels, she
spoke in a full, resounding, low-pitched voice appropriate for a bass male
speaker as confirmed by measurement of fundamental frequency from
spectrograms. Average fundamental frequency of sustained /i, a, u/ was 110 Hz
with fundamental frequency at the lowest and highest pitch levels ranging from
110 to 340 Hz. In conversational speech prosodic patterns and voice quality
were judged normal for a male; however, hard glottal attacks were noted
occasionally. When asked to demonstrate the female-type voice she had been
attempting to develop on her own, the patient's vocal characteristics changed
remarkably. Although pitch was higher, upward and downward inflection
patterns were bizarre and not consistent with sentence structure. The speech
resembled that of a male amateur comedian trying to imitate a female.
The patient also was seen by ENT for indirect laryngoscopy. The structure
and function of the vocal folds were judged normal.
Treatment
Goals. The overall goal of treatment was to train the patient to effect at
will a voice that was perceived as feminine. Primary goals were to (1) train
successively higher pitch levels while avoiding vocal abuse, and (2) modify
tongue carriage to achieve higher resonance characteristics of the vocal
tract. Secondary goals were to (1) promote a breathy vocal attack, and (2)
establish appropriate inflection patterns at higher pitch levels. The
rationales were (1) higher fundamental frequency is associated with feminine
voice; (2) higher resonance characteristics are associated with female voices;
(3) breathy vocal attacks contribute to vocal health and are considered by
some (Money and Primrose, 1969) to exemplify feminine gender; and (4)
inappropriate inflection patterns call attention to the voice as different and
unnatural.
Stimulus. Stimulus materials were selected or constructed to achieve the
specified goals. To increase fundamental frequency and change resonance
characteristics, words that contained high front vowels and anterior
consonants were identified. To encourage easy onset of phonation and the
adoption of a breathy voice quality, other words beginning with /h/ were
selected. Using these two groups of words, phrases and sentences were
constructed to represent various types of intonation patterns.
Methods. Initially, the patient was required to listen to the female
clinician's production, study the Visi-pitch display, and attempt to match the
pitch contours. From the beginning of treatment, the middle third of the
patient's frequency range was established as the target fundamental. In
general, frequency was raised in increments of 10 Hz until consistently good
vocal quality could be maintained, using an average fundamental frequency of
210 Hz. Because rising inflection patterns were easier for the patient to
imitate, they were used in the early period of treatment. To help attain
high-frequency resonance, the patient was encouraged to listen to the quality
of voice and note the elevation of the mandible and anterior tongue carriage
when producing words with high front vowels and anterior consonants. She was
directed to maintain this articulatory positioning throughout an utterance to
effect higher resonance that was labeled "thin" for qualitative purposes.
Breathiness on words beginning with /h/ and, later, on words beginning with
vowels was established by monitoring rise time of intensity patterns displayed
on the Visi-pitch. Easy onset of voicing was encouraged at all times.
A variety of intonation patterns was practiced while maintaining increased
fundamental frequency, high frequency resonance, and breathy quality. Such
practice was necessary to overcome inappropriate patterns adopted by the
patient in her pretherapy attempts to develop a feminine voice. These
behaviors were established first in the clinic and, later, during "self-
modeling" home practice using audio recordings of her best efforts in therapy.
In the final months of therapy, role-play situations stressing functional
conversations were practiced both in and outside the clinic. Assignments
outside the clinic required conversations in person and over the telephone
with people unknown to the patient. These conversations were recorded to
assess appropriateness of vocal behaviors. The patient did not consider her
speech and voice acceptable until she was referred to as "Ma'am" over the
telephone. Thus, work with the telephone continued until feminine references
predominated.
Prior to the end of treatment an otolaryngological exam indicated normal
supraglottic and glottic structures at rest and during phonation. Treatment
was terminated following 88 1-hour sessions over an 11-month period.
Maintenance was evaluated five years posttreatment.
Measurement
Broad band (300 Hz) and narrow band (45 Hz) amplitude cross-section
spectrograms were produced for /i, a, u/ vowels at the beginning and end of
treatment and five years thereafter. Narrow band sections were made in the
middle of the vowel at the most stationary portion of the second formant. The
center frequencies of formants 1, 2, and 3 were judged to be equal to the
frequency of the maximum harmonic of the first, second, and third spectral
envelopes, respectively, or at a point half-way between two adjacent high-
amplitude harmonics when two relatively equal central harmonics were present.
Fundamental frequency was estimated by counting the vertical striations
(representing the laryngeal pitch periods) in a 100-msec segment which
corresponded to the most stationary portion of F2. Starrett model 120Z
machinists dial calipers were used to measure F0 and F1, F2, and F3. The
formulas were as follows:
1. F0 = N(10) where N is the number of pitch periods in a 100-msec
segment.
2. F1, F2, or F3 = (X/3 Y) / (X/3000)
where X is the measurement in thousandths of an inch from 1000 to 4000 Hz
calibration marks on a particular spectrogram, and Y is the caliper reading
for the formant being measured.
Two judges trained in acoustic analysis independently measured F0 and F1-F3
for each vowel. When examining the broad band spectrograms the two judges
agreed 100% of the time on the number of vertical striations present within
each 100-msec segment. When measuring the formants, the two judges agreed
100% of the time on selection of the point representing the peak of each
spectral envelope. Because good reliability was obtained, formant
measurements of only one of the judges were used. On repeated measurement of
the entire sample by this judge, the greatest difference in dial caliper
readings was +0.004 inch or 8 Hz at 1053 Hz and 2092 Hz for F2 of /a/ and /i/,
respectively. When dial caliper readings were compared with a second judge,
the greatest difference in readings was 0.006 inch or 11 Hz at 3467 Hz for F3
of /i/. These differences in the caliper readings were regarded as
insignificant because such small changes in the these formants are probably
linguistically irrelevant when submitted to listeners for perceptual judgments
(Flanagan, 1955; and Mermelstein and Finch, 1976).
RESULTS
Table 1 provides frequency values for F0-F3 for /i, a, u/ vowels at the
initiation (T1), termination (T2), and five years following (T3) treatment.
In general, as fundamental frequency increased, formant frequencies in
creased. For example, five years posttreatment the mean for F0 increased to
222 Hz and F2 for all vowels was at its highest level.
Figure 1 shows that at the beginning of treatment average fundamental
frequency for /a/ was 110 Hz and after four months of treatment it increased
to 210 Hz. F0 stabilized at 210 Hz throughout the remainder of treatment.
The shaded area represents average F0 values of /a/ for males and females
studied by Peterson and Barney (1952).
Figure 2 shows second formant frequency values for /i, a, u/ at each of the
three time periods (T1-T3). The lower and upper limits of the shaded areas in
the figure depict male and female average F2 values for these vowels as
reported by Peterson and Barney. Note that the patient's F2 values for all
vowels illustrate a constant rise toward the female frequencies. When F2
values for the beginning and end of treatment were compared, F2 for /i/
increased the most (291 Hz) followed by /u/ (255 Hz) and /a/ (94 Hz).
Posttreatment F2 values continued to increase but at a lesser rate; with the
greatest change occurring for /u/ (103 Hz), followed by /a/ (44 Hz), and the
smallest change occurring for /i/ (6 Hz).
Table 1. F0-F3 in Hertz for /i, a, u/ Vowels at Initiation (T1),
Termination (T2), and Five Years after (T3) Treatment
Time Vowels F0 F1 F2 F3
/i/ 110 222 2092 3467
T1 /a/ 110 606 1053 2331
/u/ 110 277 810 1944
/i/ 195 278 2383 3505
T2 /a/ 210 829 1147 2718
/u/ 210 322 1065 2458
/i/ 235 371 2389 3363
T3 /a/ 230 955 1191 2853
/u/ 200 389 1168 2151
Figure 1. Mean fundamental frequency of /a/ at specified recording times
compared to male and female mean values (Peterson and Barney, 1952)
represented by lower and upper limits of shaded area.
Figure 2. Patient's second formant values for /i, a, u/ vowels at
initiation (T1), termination (T2), and 5 years following (T3) treatment
compared to male and female mean values (Peterson and Barney, 1952)
represented by lower and upper-limits of shaded areas.
DISCUSSION
Although a fundamental frequency comparable to that of females was obtained
after four months of therapy, the patient was not perceived as female on the
telephone. This did not occur until six months later, near the termination of
treatment. It is not surprising that she was still perceived as male because
alteration of fundamental frequency alone is not enough to achieve perception
of feminine gender (Coleman, 1976; Bralley, Bull, Gore, and Edgerton, 1978).
Although treatment for the present study was directed toward increasing both
F0 and F2, F0 values had reached those appropriate for females within the
first four months, while F2 had not. (See Figures 1 and 2). At the
initiation of treatment, the patient's F2 values for the three vowels were
below the means for male speakers studied by Peterson and Barney (1962).
Midway through treatment, the patient's F2 values crossed the male frequency
averages and began to rise toward the female means. At the end of treatment,
the patient's F2 values had exceeded the female means for /u/, were halfway
between the male and female averages for /a/, and were about one-third of the
way between the male and female means for /i/. Although the patient did not
achieve female values for F2 in every instance, her resonant characteristics
when coupled with the feminine F0 apparently were sufficiently close to those
for females to elicit feminine perception. She maintained female vocal
characteristics five years posttreatment and reported continued success in
being perceived as a female over the telephone.
There seems to be a range of acceptable fundamental and formant frequencies
necessary for identification of voice as female. The patient first achieved
F0 frequencies appropriate for females, then gradually achieved F2 values for
this gender. Thus, by the end of treatment F2 values were perceptually
consonant with those for F0 allowing perception of feminine voice when visual
clues were not available. It is assumed that perception of femininity was
accomplished by affecting a change in the resonance cavity through forward
movement of the tongue. This treatment study lends support to the hypothesis
that when a disparity exists between a female F0 and a male F2 in the
transsexual voice, perception follows F2 and that only when consonance occurs
between F0 and F2 will perception follow the female F0.
REFERENCES
Baker, H., and Green, R. (1970). Treatment of transsexualism. Curr.
Psychiatric Ther. 10:88-89.
Boone, D. (1971). The Voice and Voice Therapy. Englewood Cliffs, NJ:
Prentice-Hall.
Bralley, R., Bull, G., Gore, C., and Edgerton, M. (1978). Evaluation of
vocal pitch in male transsexuals. J. Commun. Disord. 11:443-449.
Coleman, R. O. (1971). Male and female voice quality and its relationship
to vowel formant frequencies. J. Speech Hear. Res. 14:565-577.
Coleman, R. O. (1976). A comparison of the contributions of two voice
quality characteristics to the perception of maleness and femaleness in the
voice. J. Speech Hear. Res. 19:168-180.
Fant, C. (1956). On the predictability of formant levels and spectrum
envelopes from formant frequencies. In M. Halle. H. Lunt, and H. MacLean
(eds.), For Roman Jakobson. The Hague: Mouton.
Fant, C. (1962). Descriptive analysis of the acoustic aspects of speech.
Logos, 5:3-17.
Fisher, H. (1975). Improving Voice and Articulation. Utica: H. M.
Cardamone.
Flanagan, J. (1955). A difference limen for vowel formant frequency. J.
Acoust, Soc. Am. 27:765-768.
Froeschels, E. (1952). Chewing method as therapy. Arch. Otolaryng. 56:427-
434
Kalra, M. (1977). Voice therapy with a transsexual. In R. Gemme and C.
Wheeler (eds.), Progress in Sexology. New York: Plenum Press.
Ladefoged, P., and Broadbent, D. (1957). Information conveyed by vowels. J.
Acoust. Soc. Am. 29:98-104.
Mermelstein, P., and Fitch, H. (1976). Difference limens for formant
frequencies for steady state and consonant-bounded vowels. Paper presented at
the 92nd meeting of the Acoustical Society of America.
Money, J., and Primrose, C. (1969). Sexual dimorphism and dissociation in
the psychology of male transsexuals. In R. Green and J. Money (eds.).
Transsexualism and Sex Reassignment. Baltimore: Johns Hopkins.
Peterson, G., and Barney. H. (1952). Control methods used in a study of the
vowels. J. Acoust. Soc. Am. 24:175-184.
Pronovost, W. (1942). An experimental study of methods for determining
natural and habitual-pitch. Speech Monogr. 9:111-123.
Snidecor, J. (1951). The pitch and duration characteristics of superior
female speakers during oral reading. J. Speech Hear. Disord. 16:44-52.
Wolfe, V. I., Ratusnik, D. L., and Northrop, G. (1980). Vocal
characteristics of male transsexuals on a masculinity-femininity dimension. In
The Proceedings of the The Congress of the International Association of
Logopedics and Phoniatrics. Vol 1, pp. 469-474.
|Back to Surgical Menu | Back to Main Menu |