大型语音数据库的音素搜索方法

5星 · 超过95%的资源需积分: 10 195 浏览量更新于2024-07-22 1 收藏 631KB PDF 举报

"大型语音数据库的音素搜索方法是当前电子与计算机工程领域中的一个重要研究课题，特别是在Speech Technology系列中占据一席之地。该系列由艾米·纽斯坦（Amy Neustein）担任系列编辑，其目标是挑选出学术界和私营行业中最具创新力的科学家，他们的研究成果以其新颖性、实用性和在提供广泛语音解决方案方面的实际应用而著称。 SpringerBriefs in Speech Technology系列旨在通过详尽的文献回顾和实验室及真实环境下的实证研究，分享最新的语音技术发现。该系列涵盖了多个关键领域，如实时商业应用的口语对话系统，包括语音参数化的现代方法，自动化语音的信息安全发展，以及声纹识别的法医学应用。此外，它还探讨了呼叫中心中高级语音分析的应用，以及利用复杂算法改进人机交互的新途径。在学术界，研究人员可能探索基于统计或深度学习的音素识别技术，例如连续隐马尔可夫模型（Continuous Hidden Markov Models, HMMs）、深度神经网络（Deep Neural Networks, DNNs）或者基于端到端的模型。而在私营部门，这些方法可能被应用于智能语音助手、智能家居控制系统，甚至是汽车行业的语音命令识别系统。音素搜索方法在处理大量语音数据时，强调的是效率和准确性，尤其是在嘈杂环境或存在口音和方言差异的情况下。这通常涉及到预处理步骤，如噪声消除、信号增强和特征提取，以便将语音转换为可用于比较的音素或音节单位。此外，模糊匹配和近似搜索算法也被用于在大规模数据库中快速找到相似的语音片段。 Phonetic Search Methods for Large Speech Databases不仅关注基础理论，也关注其实践应用，为语音技术的发展提供了坚实的基础。随着技术的进步，这些方法不断优化，以适应不断变化的市场需求和用户期望，推动着人机交互体验的提升。"

Contents

1 Keyword Spotting Out of Continuous Speech ................... 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Formulation: KWS in Large Speech Databases . . . . . . . . . 5

1.3 Target Applications of Keyword Spotting ................... 6

2 Keyword Spotting Methods ................................ 7

2.1 LVCSR-Based KWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Acoustic KWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Phonetic Search KWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Discussion: Why Phonetic Search? . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.2 KWS Performance . . . ............................ 10

2.4.3 Keyword Flexibility . . ............................ 10

3 Phonetic Search ......................................... 13

3.1 The Search Mechanism . . . .............................. 13

3.2 Using Phonetic Search for KWS . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Computational Complexity Analysis ....................... 16

4 Search Space Complexity Reduction .......................... 19

4.1 Overview ........................................... 19

4.2 Complexity Reduction in Phonetic Search ................... 21

4.3 Anchor-Based Phonetic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Evaluating Phonetic Search KWS ............................ 29

5.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2 Evaluation Process . ................................... 32

5.3 Evaluation Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6 Evaluation Results ....................................... 35

6.1 Exhaustive Search .................................... 35

6.1.1 Textual Benchmark .............................. 36

Chapter 1

Keyword Spotting Out of Continuous Speech

1.1 Introduction

Successful Automatic Speech Recognition (ASR) tech nology has been a research

aspiration for the past ﬁve decades. Ideally, computers would be able to transform

any type of human speech into an accurate textual transcription. Today’s ASR

technology generates fairly good results using structured speech with relatively low

Signal to Noise Ratios (SNR), but performance degrades when using spontaneous

speech in real-life noisy environments (Murveit et al. 1992; Young 1996; Furui

2003; Deng and Huang 2004). Performance that is acceptable for commercial

applications can be achieved using large training corpo ra of speech and text.

However, there are still problems that need to be resolved.

One of the main problems is the mismatch between training and testing (real-

life) condi tions (Young 1996; Baker et al. 2009; Tsao et al. 2009; Furui et al. 2012;

Saon and Chien 2012). Types of mismatches include: background noise, channel

distortion, Out of Vocabulary (OOV) words (when speakers use words not in the

recognition vocabulary), foreign accent speech, etc. Various methods and

algorithms for minimizing this mismatch between training and testing have been

suggested and implemented (Mammone et al. 1 996; Sankar and Lee 1996; Huo

et al. 1997; Matrouf and Gauvain 1997; Viikki and Laurila 1998; Hirsch and Pearce

2000; Barras et al. 2002; Parada et al. 2010; Kai et al. 2012), while in parallel, larger

amounts of representative speech (usually from live deployments) have been

injected into the training process using automatic procedures that do not necessitate

manual transcription o f the data (Kamm and Meyer 2002; Evermann et al. 2005;

Heigold et al. 2012).

The leading approach in ASR today is searching for the most probable sequence

of words that describes the input speech. The search uses: (1) acoustical models

representing the phonemes of the target language; (2) a lexicon of the recognition

vocabulary words represented as sequences of phonemes; and (3) a Language

Model (LM) specifying the word transition probabilities. ASR is performed by

inputting a sequence of vectors estimated from the input speech signal to the

A. Moyal et al., Phonetic Search Methods for Large

Speech Databases, SpringerBriefs in Speech Technology,

DOI 10.1007/978-1-4614-6489-1_1,

The Author(s) 2013

剩余58页未读，继续阅读

rxlgq

粉丝: 0
资源: 2

大型语音数据库的音素搜索方法

DNN-HMM Based Multilingual Recognizer of Telephone Speech

Phonetic Lookup (for American English)-crx插件

语音查询（美式英语）「Phonetic Lookup (for American English)」-crx插件

TIMIT 语音语料库-part-14 Acoustic-Phonetic Continuous Speech Corpus

TIMIT 语音语料库-part-5 Acoustic-Phonetic Continuous Speech Corpus

TIMIT 语音语料库-part-6 Acoustic-Phonetic Continuous Speech Corpus

TIMIT 语音语料库-part-16 Acoustic-Phonetic Continuous Speech Corpus

TIMIT 语音语料库-part-13 Acoustic-Phonetic Continuous Speech Corpus

TIMIT 语音语料库-part-11 Acoustic-Phonetic Continuous Speech Corpus

TIMIT 语音语料库-part-9 Acoustic-Phonetic Continuous Speech Corpus

最新资源