From FEMINET, Felton CA 408-335-4387 or 408-335-7888

Changing the Vocal Characteristics of a Postoperative
 Transsexual Patient: A Longitudinal Study 
   Kay H. Mount and Shirley J. Salmon 
   Audiology and Speech Pathology Service, 
   Veterans Administration Medical Center, 
   Kansas City, Missouri 

   The vocal characteristics of a 63-year-old individual who underwent male-
to-female sex reassignment surgery were evaluated.  Treatment was designed to 
alter inappropriate male voice characteristics.  Speech goals were to (1) 
encourage use of successively higher pitch levels, and (2) modify tongue 
carriage to change resonance.  After 11 months of therapy, average fundamental 
frequency for /i, a, u/ vowels changed from 110 to 205 Hz.  Also, second 
formant frequency values changed remarkably for each of these vowels, with the 
greatest frequency change being 291 Hz for /i/.  These acoustic differences 
could account for the perception of femininity in her posttreatment voice.  
Maintenance of these acoustic features was found five years posttreatment. 
   Address correspondence to Kay H. Mount, Ph.D. Audiology and Speech 
Pathology Service (126), Kansas City VA Medical Center, 4801 Linwood Blvd., 
Kansas City, MO 64128. 
   Because of differences in laryngeal size and mass, average fundamental 
frequency (F0) for females is higher (220 Hz) than for males (110 Hz).  The 
perceived pitch of the laryngeal fundamental has long been accepted as an 
acoustic cue to speaker sex.  Thus, for male-to-female transsexuals, 
fundamental frequency must change for the perception of a female voice.  Baker 
and Green (1970) report that a common misconception about male-to-female sex 
reassignment surgery is that castration or use of estrogen will raise vocal 
pitch.  That surgery or use of estrogen has little effect on vocal pitch was 
demonstrated by Wolfe, Ratusnik, and Northrop (1980).  They investigated vocal 
characteristics of 20 male-to-female transsexuals all of whom had been on 
hormone treatment for various amounts of time, but only one of whom had 
undergone surgery.  Reportedly, "mean fundamental frequency of transsexual 
speakers (93-202 Hz) covered a broad range, overlapping those of male and 
female speakers" (p. 473).  These data demonstrated that this group of 
patients exhibited fundamental frequencies that might be expected from the 
normal male population.  Their fundamental frequencies did not skew to the 
right as might be predicted if fundamental frequency was affected by either 
surgery or estrogen. 
   Those who have treated male-to-female transsexuals have recognized the 
importance of changing fundamental frequency.  Bralley, Bull, Gore, and 
Edgerton (1978) as well as Kalra (1977) presented data that illustrated how 
one male-to-female transsexual elevated fundamental frequency following vocal 
rehabilitation.  The individual described by Bralley et al. (1978) elevated 
fundamental frequency from 145 to 165 Hz.  Although the voice was higher in 
pitch and judged more feminine, the investigators reported that it could still 
be distinguished from female voices.  These findings suggest that alteration 
of fundamental frequency alone is not enough to achieve perception of feminine 
gender.  Kalra's patient raised fundamental frequency from 168 to 200 Hz and 
used Froeschel's chewing method (1952) in an attempt to increase anterior oral 
resonance.  Kalra recognized the importance of accentuating anterior oral 
resonance to accommodate the newly acquired higher pitch; however, he did not 
provide data to substantiate change in vocal tract resonance. 
   Other investigators also have recognized the need to alter vocal parameters 
besides fundamental frequency.  Wolfe, Ratusnik, and Northrop (1980) reported 
high negative correlations between ratings on a femininity-masulinity scale 
and means of five vocal characteristics (F0 formant frequency, extent of 
upward inflections, extent of downward inflections, and extent of both upward 
and downward inflections).  These findings as well as similar observations 
reported by Pronovost (1942), Snidecor (1951), and Coleman (1971, 1976) 
provide further support for the notion that other acoustic parameters in 
addition to frequency influence gender identification. 
   Vocal tract resonance characteristics may be the second most important 
acoustic cue to speaker identification.  Peterson and Barney (1952) and 
Ladefoged and Broadbent (1957) found that females have higher average vowel 
formant frequencies than males.  The importance of vocal tract resonances as a 
cue to speaker sex identification was shown by Coleman (1971).  He reported 
that listeners correctly identified speaker sex 88% of the time when listening 
to both sexes produce artificial larynx speech having a fundamental of 85 Hz.  
In 1976 Coleman investigated further the importance of vocal tract resonance 
and fundamental frequency on gender identification.  In one experiment, male 
and female speakers produced speech samples using normal voice and in another 
they used an artificial larynx.  When speakers used normal voice, vocal tract 
resonance and fundamental frequency were both important to male-female 
identification.  When they used artificial voice and when vocal tract 
resonance characteristics of one sex were combined with F0 characteristics of 
the opposite sex, listeners generally identified the speaker as male.  This 
was true whether a male F0 was combined with female vocal tract resonance or 
whether a female F0 was combined with male vocal tract resonance. 
   These cumulative findings lead to the hypotheses that for male-to-female 
transsexuals (1) raising the F0 alone will likely result in perception of male 
voice, and (2) raising the F0 and the vocal tract resonance simultaneously 
will likely result in the perception of female voice.  To test these 
hypotheses, a treatment plan was devised that focused on changing both the 
laryngeal tone and its resonance.  Consequent frequencies were measured. 
   Successful procedures for increasing fundamental frequency of male-to-
female transsexuals' voices were well established; but procedures for changing 
vocal tract resonance were not.  Yet, resonance of the vocal tract can be 
altered at will and is demonstrated during vowel production.  According to 
Fant (1956), vowels are primarily the product of the voice source and the 
filtering action of the vocal tract.  The resonant frequencies of the vocal 
tract, F1, F2, etc., are determined by characteristics of the tract including 
size and shape (Delattre, 1951; Fant, 1962).  Narrowing of the oral-pharyngeal 
cavities during production of a high front vowel such as /i/ is produced by 
elevation of the mandible, which results in low F, values, and anterior tongue 
carriage, which results in high-frequency resonance for F2.  Resonance 
produced with excessive anterior tongue carriage has been referred to as 
"thin" by Boone (1971) and Fisher (1975).  Here, "thin vocal tract resonance" 
and "upward movement of the second vowel formant frequencies" are used 
synonymously and are thought to be the result of anterior tongue carriage.  
Thus, upward and downward movements of F2 were chosen to represent change in 
vocal tract resonance for this study.  In therapy, the patient's natural 
abilities were used to produce voluntarily a higher laryngeal fundamental and 
to enhance it by forward carriage of the tongue, thereby raising second 
formant vowel frequency.  Pre- and posttreatment frequency measurements of the 
fundamental and the vowel second formant were made to quantify treatment 
   The purpose of this report is to describe the treatment provided to a 
postoperative male-to-female transsexual, present acoustic data from 
pretreatment and posttreatment voice samples, and speculate about the 
relationship between acoustic change and perception of feminine voice. 
   Equipment used during diagnosis and treatment included (1) Kay Elemetrics 
Visi-pitch, model 6087A, attached to a Tektronix oscilloscope, model 5113; (2) 
Voice Identification 700 Series sound spectrograph with attached full track 
reel-to-reel Crown tape recorder, model IM7; (3) portable Sony cassette tape 
recorder; (4) Philco minicassette recorder; (5) Bell and Howell Language 
Master; and (6) Sony U-matic video recorder and camera.  The Visi-pitch was 
used to assess fundamental frequency and relative intensity of utterances and 
to provide visual feedback regarding these parameters.  A Language Master and 
tape recorders were used to provide auditory feedback, while the video 
recorder was used to provide both auditory and visual feedback.  The sound 
spectrograph was used for acoustic analysis of selected speech samples over 
   The patient was a 63-year-old individual who began hormone treatment a year 
before undergoing male-to-female reassignment surgery.  Six months after 
surgery she came to the speech clinic for evaluation.  Her chief complaint was 
a low-pitched voice, which was not perceived as female, especially over the 
   The patient was audio- and video-recorded throughout the diagnostic 
session.  During conversational speech and production of sustained vowels, she 
spoke in a full, resounding, low-pitched voice appropriate for a bass male 
speaker as confirmed by measurement of fundamental frequency from 
spectrograms.  Average fundamental frequency of sustained /i, a, u/ was 110 Hz 
with fundamental frequency at the lowest and highest pitch levels ranging from 
110 to 340 Hz.  In conversational speech prosodic patterns and voice quality 
were judged normal for a male; however, hard glottal attacks were noted 
occasionally.  When asked to demonstrate the female-type voice she had been 
attempting to develop on her own, the patient's vocal characteristics changed 
remarkably.  Although pitch was higher, upward and downward inflection 
patterns were bizarre and not consistent with sentence structure.  The speech 
resembled that of a male amateur comedian trying to imitate a female. 
   The patient also was seen by ENT for indirect laryngoscopy.  The structure 
and function of the vocal folds were judged normal. 
   Goals.  The overall goal of treatment was to train the patient to effect at 
will a voice that was perceived as feminine.  Primary goals were to (1) train 
successively higher pitch levels while avoiding vocal abuse, and (2) modify 
tongue carriage to achieve higher resonance characteristics of the vocal 
tract.  Secondary goals were to (1) promote a breathy vocal attack, and (2) 
establish appropriate inflection patterns at higher pitch levels.  The 
rationales were (1) higher fundamental frequency is associated with feminine 
voice; (2) higher resonance characteristics are associated with female voices; 
(3) breathy vocal attacks contribute to vocal health and are considered by 
some (Money and Primrose, 1969) to exemplify feminine gender; and (4) 
inappropriate inflection patterns call attention to the voice as different and 
   Stimulus.  Stimulus materials were selected or constructed to achieve the 
specified goals.  To increase fundamental frequency and change resonance 
characteristics, words that contained high front vowels and anterior 
consonants were identified.  To encourage easy onset of phonation and the 
adoption of a breathy voice quality, other words beginning with /h/ were 
selected.  Using these two groups of words, phrases and sentences were 
constructed to represent various types of intonation patterns. 
   Methods.  Initially, the patient was required to listen to the female 
clinician's production, study the Visi-pitch display, and attempt to match the 
pitch contours.  From the beginning of treatment, the middle third of the 
patient's frequency range was established as the target fundamental.  In 
general, frequency was raised in increments of 10 Hz until consistently good 
vocal quality could be maintained, using an average fundamental frequency of 
210 Hz.  Because rising inflection patterns were easier for the patient to 
imitate, they were used in the early period of treatment.  To help attain 
high-frequency resonance, the patient was encouraged to listen to the quality 
of voice and note the elevation of the mandible and anterior tongue carriage 
when producing words with high front vowels and anterior consonants.  She was 
directed to maintain this articulatory positioning throughout an utterance to 
effect higher resonance that was labeled "thin" for qualitative purposes. 
   Breathiness on words beginning with /h/ and, later, on words beginning with 
vowels was established by monitoring rise time of intensity patterns displayed 
on the Visi-pitch.  Easy onset of voicing was encouraged at all times. 
   A variety of intonation patterns was practiced while maintaining increased 
fundamental frequency, high frequency resonance, and breathy quality.  Such 
practice was necessary to overcome inappropriate patterns adopted by the 
patient in her pretherapy attempts to develop a feminine voice.  These 
behaviors were established first in the clinic and, later, during "self-
modeling" home practice using audio recordings of her best efforts in therapy. 
   In the final months of therapy, role-play situations stressing functional 
conversations were practiced both in and outside the clinic.  Assignments 
outside the clinic required conversations in person and over the telephone 
with people unknown to the patient.  These conversations were recorded to 
assess appropriateness of vocal behaviors.  The patient did not consider her 
speech and voice acceptable until she was referred to as "Ma'am" over the 
telephone.  Thus, work with the telephone continued until feminine references 
   Prior to the end of treatment an otolaryngological exam indicated normal 
supraglottic and glottic structures at rest and during phonation.  Treatment 
was terminated following 88 1-hour sessions over an 11-month period.  
Maintenance was evaluated five years posttreatment. 
   Broad band (300 Hz) and narrow band (45 Hz) amplitude cross-section 
spectrograms were produced for /i, a, u/ vowels at the beginning and end of 
treatment and five years thereafter.  Narrow band sections were made in the 
middle of the vowel at the most stationary portion of the second formant.  The 
center frequencies of formants 1, 2, and 3 were judged to be equal to the 
frequency of the maximum harmonic of the first, second, and third spectral 
envelopes, respectively, or at a point half-way between two adjacent high-
amplitude harmonics when two relatively equal central harmonics were present.  
Fundamental frequency was estimated by counting the vertical striations 
(representing the laryngeal pitch periods) in a 100-msec segment which 
corresponded to the most stationary portion of F2.  Starrett model 120Z 
machinists dial calipers were used to measure F0 and F1, F2, and F3.  The 
formulas were as follows: 
   1.  F0 = N(10) where N is the number of pitch periods in a 100-msec 
   2.  F1, F2, or F3 = (X/3  Y) / (X/3000) 
   where X is the measurement in thousandths of an inch from 1000 to 4000 Hz 
calibration marks on a particular spectrogram, and Y is the caliper reading 
for the formant being measured. 
   Two judges trained in acoustic analysis independently measured F0 and F1-F3 
for each vowel.  When examining the broad band spectrograms the two judges 
agreed 100% of the time on the number of vertical striations present within 
each 100-msec segment.  When measuring the formants, the two judges agreed 
100% of the time on selection of the point representing the peak of each 
spectral envelope.  Because good reliability was obtained, formant 
measurements of only one of the judges were used.  On repeated measurement of 
the entire sample by this judge, the greatest difference in dial caliper 
readings was +0.004 inch or 8 Hz at 1053 Hz and 2092 Hz for F2 of /a/ and /i/, 
respectively.  When dial caliper readings were compared with a second judge, 
the greatest difference in readings was 0.006 inch or 11 Hz at 3467 Hz for F3 
of /i/.  These differences in the caliper readings were regarded as 
insignificant because such small changes in the these formants are probably 
linguistically irrelevant when submitted to listeners for perceptual judgments 
(Flanagan, 1955; and Mermelstein and Finch, 1976). 
   Table 1 provides frequency values for F0-F3 for /i, a, u/ vowels at the 
initiation (T1), termination (T2), and five years following (T3) treatment.  
In general, as fundamental frequency increased, formant frequencies in 
creased.  For example, five years posttreatment the mean for F0 increased to 
222 Hz and F2 for all vowels was at its highest level. 
   Figure 1 shows that at the beginning of treatment average fundamental 
frequency for /a/ was 110 Hz and after four months of treatment it increased 
to 210 Hz.  F0 stabilized at 210 Hz throughout the remainder of treatment.  
The shaded area represents average F0 values of /a/ for males and females 
studied by Peterson and Barney (1952). 
   Figure 2 shows second formant frequency values for /i, a, u/ at each of the 
three time periods (T1-T3).  The lower and upper limits of the shaded areas in 
the figure depict male and female average F2 values for these vowels as 
reported by Peterson and Barney.  Note that the patient's F2 values for all 
vowels illustrate a constant rise toward the female frequencies.  When F2 
values for the beginning and end of treatment were compared, F2 for /i/ 
increased the most (291 Hz) followed by /u/ (255 Hz) and /a/ (94 Hz).  
Posttreatment F2 values continued to increase but at a lesser rate; with the 
greatest change occurring for /u/ (103 Hz), followed by /a/ (44 Hz), and the 
smallest change occurring for /i/ (6 Hz). 
   Table 1.  F0-F3 in Hertz for /i, a, u/ Vowels at Initiation (T1), 
Termination (T2), and Five Years after (T3) Treatment 
   Time    Vowels  F0      F1      F2      F3 
            /i/    110     222     2092    3467    
   T1      /a/     110     606     1053    2331 
            /u/    110     277     810     1944 
            /i/    195     278     2383    3505 
   T2      /a/     210     829     1147    2718 
            /u/    210     322     1065    2458 
           /i/     235     371     2389    3363 
   T3      /a/     230      955    1191    2853 
            /u/    200     389     1168    2151 
   Figure 1.  Mean fundamental frequency of /a/ at specified recording times 
compared to male and female mean values (Peterson and Barney, 1952) 
represented by lower and upper limits of shaded area.  
   Figure 2.  Patient's second formant values for /i, a, u/ vowels at 
initiation (T1), termination (T2), and 5 years following (T3) treatment 
compared to male and female mean values (Peterson and Barney, 1952) 
represented by lower and upper-limits of shaded areas. 
   Although a fundamental frequency comparable to that of females was obtained 
after four months of therapy, the patient was not perceived as female on the 
telephone.  This did not occur until six months later, near the termination of 
treatment.  It is not surprising that she was still perceived as male because 
alteration of fundamental frequency alone is not enough to achieve perception 
of feminine gender (Coleman, 1976; Bralley, Bull, Gore, and Edgerton, 1978).  
Although treatment for the present study was directed toward increasing both 
F0 and F2, F0 values had reached those appropriate for females within the 
first four months, while F2 had not.  (See Figures 1 and 2).  At the 
initiation of treatment, the patient's F2 values for the three vowels were 
below the means for male speakers studied by Peterson and Barney (1962).  
Midway through treatment, the patient's F2 values crossed the male frequency 
averages and began to rise toward the female means.  At the end of treatment, 
the patient's F2 values had exceeded the female means for /u/, were halfway 
between the male and female averages for /a/, and were about one-third of the 
way between the male and female means for /i/.  Although the patient did not 
achieve female values for F2 in every instance, her resonant characteristics 
when coupled with the feminine F0 apparently were sufficiently close to those 
for females to elicit feminine perception.  She maintained female vocal 
characteristics five years posttreatment and reported continued success in 
being perceived as a female over the telephone. 
   There seems to be a range of acceptable fundamental and formant frequencies 
necessary for identification of voice as female.  The patient first achieved 
F0 frequencies appropriate for females, then gradually achieved F2 values for 
this gender.  Thus, by the end of treatment F2 values were perceptually 
consonant with those for F0 allowing perception of feminine voice when visual 
clues were not available.  It is assumed that perception of femininity was 
accomplished by affecting a change in the resonance cavity through forward 
movement of the tongue.  This treatment study lends support to the hypothesis 
that when a disparity exists between a female F0 and a male F2 in the 
transsexual voice, perception follows F2 and that only when consonance occurs 
between F0 and F2 will perception follow the female F0. 
   Baker, H., and Green, R. (1970). Treatment of transsexualism. Curr. 
Psychiatric Ther. 10:88-89. 
   Boone, D. (1971). The Voice and Voice Therapy. Englewood Cliffs, NJ: 
   Bralley, R., Bull, G., Gore, C., and Edgerton, M. (1978). Evaluation of 
vocal pitch in male transsexuals. J. Commun. Disord. 11:443-449. 
   Coleman, R. O. (1971). Male and female voice quality and its relationship 
to vowel formant frequencies. J. Speech Hear. Res. 14:565-577. 
   Coleman, R. O. (1976). A comparison of the contributions of two voice 
quality characteristics to the perception of maleness and femaleness in the 
voice. J. Speech Hear. Res. 19:168-180. 
    Fant, C. (1956). On the predictability of formant levels and spectrum 
envelopes from formant frequencies. In M. Halle. H. Lunt, and H. MacLean 
(eds.), For Roman Jakobson. The Hague: Mouton. 
   Fant, C. (1962). Descriptive analysis of the acoustic aspects of speech. 
Logos, 5:3-17. 
   Fisher, H. (1975). Improving Voice and Articulation. Utica: H. M. 
   Flanagan, J. (1955). A difference limen for vowel formant frequency. J. 
Acoust, Soc. Am. 27:765-768. 
   Froeschels, E. (1952). Chewing method as therapy. Arch. Otolaryng. 56:427-
   Kalra, M. (1977). Voice therapy with a transsexual. In R. Gemme and C. 
Wheeler (eds.), Progress in Sexology. New York: Plenum Press. 
   Ladefoged, P., and Broadbent, D. (1957). Information conveyed by vowels. J. 
Acoust. Soc. Am. 29:98-104. 
   Mermelstein, P., and Fitch, H. (1976). Difference limens for formant 
frequencies for steady state and consonant-bounded vowels. Paper presented at 
the 92nd meeting of the Acoustical Society of America. 
   Money, J., and Primrose, C. (1969). Sexual dimorphism and dissociation in 
the psychology of male transsexuals. In R. Green and J. Money (eds.). 
Transsexualism and Sex Reassignment. Baltimore: Johns Hopkins. 
   Peterson, G., and Barney. H. (1952). Control methods used in a study of the 
vowels. J. Acoust. Soc. Am. 24:175-184. 
   Pronovost, W. (1942). An experimental study of methods for determining 
natural and habitual-pitch. Speech Monogr. 9:111-123. 
   Snidecor, J. (1951). The pitch and duration characteristics of superior 
female speakers during oral reading. J. Speech Hear. Disord. 16:44-52. 
   Wolfe, V. I., Ratusnik, D. L., and Northrop, G. (1980). Vocal 
characteristics of male transsexuals on a masculinity-femininity dimension. In 
The Proceedings of the The Congress of the International Association of 
Logopedics and Phoniatrics. Vol 1, pp. 469-474. 

|Back to Surgical Menu | Back to Main Menu |