低信噪比下语音端点检测的创新方法与挑战

91 浏览量更新于2024-08-26 收藏 125KB PDF 举报

在当今的语音通信和自动语音识别系统中，语音端点检测（Speech Endpoint Detection, SED）是一项至关重要的技术。其核心任务是在背景噪声环境下准确地识别出说话人的语音片段和非语音片段，包括停顿、静音或噪声，从而为后续的语音识别提供精确的起始和结束时间标记。据研究表明，语音识别系统的超过一半错误往往源于端点检测环节，这意味着这一过程的性能对整体系统准确度有重大影响。传统上，基于能量和零穿越率的方法在处理低信噪比（Signal-to-Noise Ratio, SNR）条件下的性能已经不再足够稳健。随着人们对高质量语音通信和高效语音识别技术需求的增长，近年来的研究者们不断探索新的方法来提高在噪声环境下的端点检测能力。这些新方法主要依赖于引入和改进不同的特征提取策略： 1. 频率带宽变化：这种方法关注信号在不同频率段的变化，通过对比不同频带的特性，可以更有效地区分语音和噪声。 2. 隐马尔可夫模型（Hidden Markov Model, HMM）：利用统计建模手段，结合语言和声学模型，能够捕捉到语音和非语音序列的动态特性，增强端点检测的准确性。 3. 频域能量特征：通过分析信号在频域的分布，提取能量特征，能够在噪声背景下更好地定位语音片段。 4. 信息熵：信息熵是一种衡量信号不确定性的指标，通过计算信号的熵值，可以揭示语音与噪声的差异，有助于端点检测。 5. 差分特征：利用信号的时域差分或者频域差分，可以捕捉到语音信号的瞬时变化，增强对噪声的抑制能力。这些新方法往往结合了深度学习、机器学习和信号处理技术，如深度神经网络（Deep Neural Networks, DNN）、卷积神经网络（Convolutional Neural Networks, CNN）以及循环神经网络（Recurrent Neural Networks, RNN），以提升对复杂噪声环境的适应性和鲁棒性。然而，尽管取得了显著的进步，低信噪比下的语音端点检测仍然是一个挑战，因为噪声可能会淹没语音信号的特征，使得检测变得困难。因此，研究者们仍在不断优化算法和特征选择，以进一步提升在极端条件下的端点检测性能。

1 INTRODUCTION

There are plenty of pauses in the conversation. Speech

endpoint detection is a process that judges which are speech

segments and which are noise segments from speech signal

containing background noise, and finds out the beginning

and ending exactly[1-4]. Research shows that more than

half errors of speech recognition system come from

endpoint detection, and success or failure of speech

recognition system is mainly determined by the accuracy of

endpoint detection to some extent.

So far, the research of speech endpoint detection has been

developing for decades, and has generated a lot of methods,

but traditional energy and zero-crossing rate methods have

already no longer been robust under low signal-to-noise

ratios. In recent years, with the strong demand for practical

speech communication quality and speech recognition

technology, appears again many new methods. They

mainly use various new features to improve the

performance of anti-noise. For example, the method of

frequency band variance, the method of HMM model, the

feature of frequency domain energy, the feature of

information entropy, the features of differential energy and

differential zero-crossing rate, the feature of TF parameters,

the distance of auto correlated similarity, the feature of

higher-order statistics, the feature of short-time

energy-zero-produce, and the feature of discrimination

information[5-8].

Although speech endpoint detection systems have achieved

high performance under laboratory environments, the

performance is deteriorated dramatically with the influence

of the background noise and the transmission channel in

practical environments. For instance, the method of

frequency band variance will encounter some pulse

This work is supported by National Nature Science Foundation under

Grant

61403042 and is supported by Education Department of Liaoning

Province of China

L2013423

interference in the practical application, and in those region

can have large short-time features, so the threshold are

difficult to determine. Although the method of HMM

model has a high accuracy, it needs pre-trained model. The

method of information entropy can availably differentiate

sonant and noise of speech signal, but it is difficult to

differentiate unvoiced sound and noise. The method of

short-time energy-zero-produce is a very simple method,

but it uses fixed threshold that may lead to bad anti-noise

performance. The method of discrimination information is

regarded as a measurement of the similarity of signal and

noise, while it’s effect is not very good in low

signal-to-noise ratio conditions, but it’s effect is very good

in the case of the seriously noise environment. Therefore,

we proposed a new method that based on short-time

energy-zero-product and discrimination information, and

the method gives a precise and rapid endpoint detection in

the case of the seriously changed noise environment.

2 ALGORITHM DESCRIPTION

2.1 Description of Short-Time Energy-Zero-Produce

The product of short-time energy and corresponding

short-time zero-crossing rate is called short-time

energy-zero-produce. The definition of every frame

short-time energy

, short-time zero-crossing rate

and short-time energy-zero-produce

Z is respectively

[9]:

()

−

(1)

sgn[ ( )] sgn[ ( 1)]

nww

Zsksk

=−−

(2)

nnn

ZEZ=∗ (3)

Research on Speech Endpoint Detection under Low Signal-to-Noise Ratios

HAN Zhiyan, WANG Jian

College of Engineering, Bohai University, Jinzhou 121000

E-mail: hanzyme@126.com

Abstract: A novel speech endpoint detection algorithm was proposed to improve the accuracy in low signal-to-noise

ratio (SNR) conditions.

Core technology was based on the complementarity between the short-time energy-zero-product

and discrimination information, which used short-time energy-zero-product algorithm to make judgment firstly, and then

used discrimination information based on the sub-band energy distribution probabilities algorithm to recheck when met

with the transition for noise frame and speech frame, so as to avoided error-detected owing to the sharp change of noise

amplitude and the ending speech frames which were polluted by noise.

Moreover, we proposed a novel dynamically

update the noise energy threshold algorithm, which could trace the changes for noise energy better. The simulation

experimental results show that the new method gives a precise and rapid endpoint detection in the case of the seriously

changed noise environment, and it plays a very good foreshadowing role in the latter speech research.

Key Words: Speech Signal, Endpoint Detection, Short-Time Energy-Zero-Product, Discrimination Information

3635

978-1-4799-7016-2/15/$31.00

2015 IEEE

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38610070

粉丝: 2

低信噪比下语音端点检测的创新方法与挑战

低信噪比下语音端点检测方法的创新研究

低信噪比条件下电信语音端点检测技术研究

低信噪比语音端点检测与短时分形自适应滤波：实验验证与效果分析

低信噪比语音信号端点检测和自适应滤波

基于小波变换的15dB信噪比下语音端点检测及其特征分析

低信噪比条件下的语音端点检测与增强

电信设备-基于时频瞬时能量谱的低信噪比语音端点检测方法.zip

论文研究-低信噪比下基于功率谱熵的语音端点检测算法.pdf

语音增强技术在低信噪比环境下的双门限语音端点检测算法

低信噪比环境下改进的新能零熵语音端点检测.docx

最新资源