interrelated aims. The first aim is to quantify the effect of
noise by means of acoustic analysis of the vowel and conso-
nant spectra. A subset of the aforementioned factors, known
to be important for vowel and stop-consonant identification,
will be assessed by comparing the acoustic parameters 共e.g.,
spectral tilt, etc.兲 estimated in quiet with those estimated in
noise. The acoustic parameter comparisons are meant to an-
swer several questions, including the following. 共1兲 How are
the vowel spectral envelopes 共critical-band spectra兲 affected?
共2兲 How are the two formant frequencies 共F1 and F2兲, known
to be major cues to vowel recognition, affected? 共3兲 How are
the spectral tilt and frequency of the burst spectra affected?
The above, and other, questions will be answered quantita-
tively by performing acoustic analysis of vowels and stop
consonants embedded in −5 to 10 dB noise.
The second aim of this paper is to assess the perceptual
effect of noise on vowel and stop-consonant identification.
This will be done by performing correlation analysis be-
tween the acoustic parameter values and the vowel/stop-
consonant identification scores. The second aim will be ad-
dressed in experiment 1. The results from the acoustic
analysis and experiment 1 taken together will provide valu-
able insights on the cues used by listeners to understand
speech in noise. Knowing how noise affects the spectrum of
speech is important for several reasons. For one, such knowl-
edge could help us design better noise reduction algorithms
that could potentially improve hearing-impaired listeners’
speech understanding in noise. Secondly, it could help us
better understand which speech features are perceptually ro-
bust in additive noise, and consequently which features lis-
teners attend to when identifying vowels or consonants in
noise.
II. ACOUSTIC ANALYSIS
A. Method
1. Speech material
The vowel material consisted of the vowels in the
words: “heed, hid, hayed, head, had, hod, hud, hood, hoed,
who’d, heard.” The stimuli were drawn from a large multi-
talker vowel set used by Hillenbrand et al. 共1995兲. A total of
66 vowel tokens were used for acoustic analysis: 33 vowels
produced by male speakers and 33 vowels produced by fe-
male speakers. There were 6 tokens for each of the 11 vow-
els, 3 produced by male speakers and 3 by female speakers.
A total of 20 different male speakers and 23 female speakers
produced the 66 vowel tokens. Each speaker produced only a
subset of the 11 vowels. The vowels were sampled at
16 kHz. Table I gives the steady-state F1 and F2 values of
the vowel stimuli used in this study. The F1 and F2 values
were sampled at the steady-state portion of the vowel and
averaged across all speakers. The steady-state F1 and F2
values 共Table I兲 of the vowel stimuli used in this study were
provided by Hillenbrand et al. 共1995兲.
Consonant material consisted of the stop consonants in
VCV context, where V = /i a, u/ and C= / bdgptk/.The
stimuli were drawn from recordings made by Shannon et al.
共1999兲. A total of 36 consonant tokens were used for acoustic
analysis: 18 consonants 共6 stops⫻ 3 vowel contexts兲 pro-
duced by a male speaker and 18 consonants produced by a
female speaker. The consonants were sampled at 44.1 kHz.
2. Noise
Two types of noise were used, multi-talker babble 共two
male and two female talkers兲 and speech-shaped noise. The
babble was taken from the AudiTEC CD 共St. Louis兲 and was
sampled at 16 kHz. The speech-shaped noise 共sampled at
20 kHz兲 was constructed by filtering white noise through a
60-tap FIR filter with a frequency response that matched the
long-term spectrum of the 11 male and 11 female vowels.
Noise was first up-sampled to the sampling frequency of the
vowel/consonant materials and then added to the vowels at
−5, 0, 5, and 10 dB. Figure 1 shows the averaged long-term
spectra of the multi-talker babble and speech-shaped noise.
3. Acoustic analysis of vowels
Prior to the acoustic analysis, the complete vowel data
set was manually segmented to 关h Vowel d兴. The starting and
ending times of the vocalic nuclei were measured by hand
TABLE I. Mean F1 and F2 frequencies 共in Hz兲 of the vowels used in this study.
Had Hod Head Hayed Heard Hid Heed Hoed Hood Hud Who’d
F1 Male 627 786 555 438 466 384 331 500 424 629 319
Female 666 883 693 492 518 486 428 538 494 809 435
F2 Male 1910 1341 1851 2196 1377 2039 2311 868 992 1146 938
Female 2370 1682 1991 2437 1604 2332 2767 998 1102 1391 1384
FIG. 1. 共Color online兲 The long-term spectra of the multi-talker babble and
continuous speech-shaped noise used in this study as maskers.
3876 J. Acoust. Soc. Am., Vol. 118, No. 6, December 2005 G. Parikh and P. C. Loizou: Effect of noise on perception