Section 1.1 The Speech Signal 3
r
What are the basic digital representations of speech signals, and how are they
used in algorithms for speech processing?
r
What are the important applications that are enabled by digital speech pro-
cessing methods?
We begin our study by taking a look at the speech signal and getting a feel of its
nature and properties.
1.1 THE SPEECH SIGNAL
The fundamental purpose of speech is human communication; i.e., the transmission
of messages between a speaker and a listener. According to Shannon’s information
theory [364], a message represented as a sequence of discrete symbols can be quanti-
fied by its information content in bits, where the rate of transmission of information is
measured in bits per second (bps). In speech production, as well as in many human-
engineered electronic communication systems, the information to be transmitted is
encoded in the form of a continuously varying (analog) waveform that can be trans-
mitted, recorded (stored), manipulated, and ultimately decoded by a human listener.
The fundamental analog form of the message is an acoustic waveform that we call
the speech signal. Speech signals, such as the one illustrated in Figure 1.2, can be con-
verted to an electrical waveform by a microphone, further manipulated by both analog
and digital signal processing methods, and then converted back to acoustic form by a
loudspeaker, a telephone handset, or headphone, as desired. This form of speech pro-
cessing is, of course, the basis for Bell’s telephone invention as well as today’s multitude
of devices for recording, transmitting, and manipulating speech and audio signals. In
Bell’s own words [47],
Watson, if I can get a mechanism which will make a current of electricity vary its
intensity as the air varies in density when sound is passing through it, I can telegraph
any sound, even the sound of speech.
Although Bell made his great invention without knowing about information theory,
the principles of information theory have assumed great importance in the design
of sophisticated modern digital communications systems. Therefore, even though our
main focus will be mostly on the speech waveform and its representation in the
form of parametric models, it is nevertheless useful to begin with a discussion of the
information that is encoded in the speech waveform.
Figure 1.3 shows a pictorial representation of the complete process of producing
and perceiving speech—from the formulation of a message in the brain of a speaker,
to the creation of the speech signal, and finally to the understanding of the message by
a listener. In their classic introduction to speech science, Denes and Pinson appropri-
ately referred to this process as the “speech chain” [88]. A more refined block diagram
representation of the speech chain is shown in Figure 1.4. The process starts in the
upper left as a message represented somehow in the brain of the speaker. The mes-
sage information can be thought of as having a number of different representations
during the process of speech production (the upper path in Figure 1.4). For example