python实现mfcc
时间: 2023-09-27 15:09:00 浏览: 102
MFCC (Mel Frequency Cepstral Coefficients,梅尔频率倒谱系数) 是一种常用的音频特征提取方法,可以用于语音识别、音乐信息检索等领域。下面是一个简单的 Python 实现:
首先,我们需要导入一些库:
```python
import numpy as np
from scipy.fftpack import dct
from scipy import signal
```
接下来,定义一个函数来实现 MFCC 特征提取:
```python
def mfcc(signal, samplerate=16000, winlen=0.025, winstep=0.01, numcep=13, nfilt=26, nfft=512, lowfreq=0, highfreq=None, preemph=0.97, ceplifter=22):
"""
Compute MFCC features from an audio signal.
"""
# Pre-emphasis
signal = signal - np.mean(signal)
signal = np.append(signal[0], signal[1:] - preemph * signal[:-1])
# Framing
winlen = int(winlen * samplerate)
winstep = int(winstep * samplerate)
frames = signal[:len(signal) - winlen: winstep] * np.hamming(winlen)
# Power spectrum
pspec = np.abs(np.fft.rfft(frames, nfft))**2
if highfreq is None:
highfreq = samplerate / 2
else:
highfreq = min(highfreq, samplerate / 2)
binfreq = np.fft.rfftfreq(nfft, 1/samplerate)
bins = np.arange(nfilt + 2) / (nfilt + 1) * (highfreq - lowfreq) + lowfreq
bwidth = bins[1:] - bins[:-1]
# Filterbank
fbank = np.zeros((nfilt, int(np.floor(nfft / 2 + 1))))
for i in range(nfilt):
low = bins[i]
center = bins[i + 1]
high = bins[i + 2]
lbin = np.floor(low * nfft / samplerate) + 1
cbin = np.floor(center * nfft / samplerate) + 1
hbin = np.floor(high * nfft / samplerate) + 1
fbank[i, int(lbin): int(cbin)] = (np.arange(cbin, int(lbin) - 1, -1) - lbin) / (cbin - lbin)
fbank[i, int(cbin): int(hbin)] = (hbin - np.arange(cbin, hbin)) / (hbin - cbin)
# Apply filterbank
feat = np.dot(pspec, fbank.T)
feat = np.where(feat == 0, np.finfo(float).eps, feat)
feat = np.log(feat)
# DCT
feat = dct(feat, type=2, axis=1, norm='ortho')[:, :numcep]
# Cepstral lifter
lifter = 1 + (ceplifter / 2) * np.sin(np.pi * np.arange(numcep) / ceplifter)
feat = feat * lifter
return feat
```
其中,`signal` 是输入的音频信号,`samplerate` 是采样率,`winlen` 是窗口长度,`winstep` 是窗口步长,`numcep` 是 MFCC 的维度,`nfilt` 是滤波器组数,`nfft` 是 FFT 的长度,`lowfreq` 和 `highfreq` 是滤波器组的频率范围,`preemph` 是预加重系数,`ceplifter` 是 cepstral lifter 系数。
该函数的返回值是一个二维数组,每行表示一个音频帧的 MFCC 特征。你可以将这些特征作为输入用于下游任务,比如说语音识别。
阅读全文