【Foundation】Feature Extraction of Speech Signals in MATLAB: Understanding MFCC and LPCC Features
发布时间: 2024-09-14 06:03:11 阅读量: 36 订阅数: 61
# 2.1 Theoretical Foundation of MFCC Features
### 2.1.1 Time-Frequency Analysis of Speech Signals
Speech signals are time-varying signals with their frequency and amplitude changing over time. To analyze the time-frequency characteristics of speech signals, ***mon time-frequency analysis techniques include the Short-Time Fourier Transform (STFT) and the Mel-Frequency Cepstral Coefficients (MFCC).
STFT decomposes a speech signal into a series of short-time windows and then performs Fourier transforms on each short-time window, obtaining the frequency spectrum of that window. By connecting the frequency spectra of various short-time windows, a time-frequency diagram of the speech signal can be formed.
### 2.1.2 Mel-Frequency Cepstral Coefficients
Mel-Frequency Cepstral Coefficients (MFCC) are time-frequency features designed based on the characteristics of human auditory perception. The human ear has different sensitivities to sounds of different frequencies, being more sensitive to low-frequency sounds than high-frequency ones. MFCC maps the frequency spectrum of the speech signal onto the Mel frequency scale to simulate the characteristics of human auditory perception.
The Mel frequency scale is a nonlinear scale whose frequency intervals match human perception of sound. By mapping the frequency spectrum of the speech signal onto the Mel frequency scale, the Mel-frequency cepstral of the speech signal can be obtained.
# 2. MFCC Feature Extraction
### 2.1 Theoretical Foundation of MFCC Features
#### 2.1.1 Time-Frequency Analysis of Speech Signals
Speech signals are time-varying signals, and their spectra continuously change over time. To analyze the time-frequency characteristics of these signals, ***mon methods include the Short-Time Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficients (MFCC).
STFT decomposes the speech signal into a series of short-time stationary signals and computes the Fourier transform for each short-time signal. Thus, the time-frequency characteristics of the speech signal can be represented as a time-frequency spectrogram.
#### 2.1.2 Mel-Frequency Cepstral Coefficients
Mel-Frequency Cepstral Coefficients (MFCC) are feature extraction methods based on human auditory perception. It maps the time-frequency spectrogram of the speech signal onto the Mel frequency scale and then computes the cepstral coefficients for each Mel frequency band.
The Mel frequency scale is a nonlinear frequency scale that simulates human auditory perception of frequency. The Mel intervals are smaller at lower frequencies and larger at higher frequencies.
The cepstral coefficients are the log energies of the frequency components in the time-frequency spectrogram. By calculating the cepstral coefficients for Mel frequency bands, the MFCC features of the speech signal are obtained.
### 2.2 Practical Application of MFCC Feature Extraction
#### 2.2.1 MFCC Feature Extraction Algorithm
The MFCC feature extraction algorithm mainly includes the following steps:
1. **Pre-emphasis:** Apply pre-emphasis to the speech signal to compensate for the attenuation of the low-frequency components.
2. **Framing:** Segment the speech signal into overlapping frames.
3. **Windowing:** Apply a window to each frame to reduce spectral leakage at frame boundaries.
4. **Fourier Transform:** Perform the Fourier Transform on each windowed signal to obtain the time-frequency spectrogram.
5. **Mel Filtering:** Map the time-frequency spectrogram onto the Mel frequency scale to obtain the Mel spectrogram.
6. **Cepstral Transformation:** Apply a cepstral transformation to the Mel spectrogram to obtain the MFCC featur
0
0