根据基于双门限法、相关法、谱熵法等的语音端点检测原理,编写程序实现语音信号的端点检测函数,函数输出参数包括:voiceseg是一个数据结构,记录了语音端点的信息;vsl是voiceseg 的长度;SF是语音帧标志(SF=1表示该帧是语音段);NF是噪声/无声帧标志(NF=1表示该帧是噪声/无声段)
时间: 2023-06-14 17:07:26 浏览: 90
matlab.rar_双门限法_端点检测_端点检测C_语音端点_语音端点 检测
5星 · 资源好评率100%
以下是一个简单的语音端点检测函数的示例代码:
```python
import numpy as np
from scipy.signal import butter, lfilter
def butter_bandpass(lowcut, highcut, fs, order=5):
nyq = 0.5 * fs
low = lowcut / nyq
high = highcut / nyq
b, a = butter(order, [low, high], btype='band')
return b, a
def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
b, a = butter_bandpass(lowcut, highcut, fs, order=order)
y = lfilter(b, a, data)
return y
def endpoint_detection(signal, frame_len, frame_step, win_len, threshold=0.05):
# Pre-emphasis
signal = np.append(signal[0], signal[1:] - 0.97 * signal[:-1])
# Framing
signal_length = len(signal)
frame_length = int(round(frame_len * fs))
frame_step = int(round(frame_step * fs))
num_frames = int(np.ceil(float(np.abs(signal_length - frame_length)) / frame_step))
pad_signal_length = num_frames * frame_step + frame_length
z = np.zeros((pad_signal_length - signal_length,))
pad_signal = np.append(signal, z)
indices = np.tile(np.arange(0, frame_length), (num_frames, 1)) + np.tile(np.arange(0, num_frames * frame_step, frame_step), (frame_length, 1)).T
frames = pad_signal[indices.astype(np.int32, copy=False)]
# Windowing
frames *= np.hamming(frame_length)
# Feature extraction
mag_frames = np.abs(np.fft.rfft(frames, win_len))
pow_frames = ((1.0 / win_len) * ((mag_frames) ** 2))
# Apply band-pass filter
filtered_frames = butter_bandpass_filter(pow_frames, 300, 3000, fs, 6)
# Compute short-term average energy
energy = np.sum(filtered_frames, axis=1)
energy /= np.max(energy)
# Set threshold
threshold *= np.average(energy)
# Find endpoints
endpoints = np.where(energy > threshold)[0]
endpoints_diff = np.diff(endpoints)
endpoints_diff = np.insert(endpoints_diff, 0, endpoints[0])
endpoints_diff = np.append(endpoints_diff, len(signal) - endpoints[-1])
endpoints = endpoints[np.where(endpoints_diff > 0.1 * fs)[0]]
endpoints_diff = np.diff(endpoints)
endpoints_diff = np.insert(endpoints_diff, 0, endpoints[0])
endpoints_diff = np.append(endpoints_diff, len(signal) - endpoints[-1])
endpoints = endpoints[np.where(endpoints_diff > 0.1 * fs)[0]]
# Generate output
voiceseg = np.zeros((2, len(endpoints)))
voiceseg[0, :] = endpoints / float(fs)
voiceseg[1, :] = np.diff(np.append(endpoints, signal_length)) / float(fs)
vsl = len(endpoints)
SF = np.zeros((num_frames,))
SF[endpoints] = 1
NF = np.zeros((num_frames,))
NF[np.where(SF == 0)] = 1
return voiceseg, vsl, SF, NF
```
其中,`signal` 是输入信号,`frame_len` 和 `frame_step` 分别是帧长和帧移(单位为秒),`win_len` 是 FFT 窗口长度,`fs` 是采样率。函数返回值 `voiceseg` 是一个 2xN 的矩阵,其中第一行是语音段的起始时间,第二行是语音段的时长(单位为秒);`vsl` 是语音段的数量;`SF` 是语音帧标志,`NF` 是噪声/无声帧标志。
阅读全文