Kaldi语音识别入门教程：构建测试时间解码图详解

需积分: 8 39 浏览量更新于2024-08-04 收藏 140KB PDF 举报

Kaldi是一款强大的开源语音识别工具包，专注于基于高斯混合模型（GMM）和隐马尔可夫模型（HMM）的语音识别技术。本篇教程旨在引导初学者快速入门Kaldi的语音识别开发，特别关注解码图（decoding graph）的创建过程，这是语音识别中的关键步骤。首先，解码图是在测试时间用于实际语音识别任务的工具。它将语言模型、声学模型以及可能的其他语言处理组件连接起来，形成一个计算路径来找到输入语音序列最可能的文本对应。这个过程涉及到一系列数据准备阶段，如初始化符号表（word.txt和phones.txt），它们分别映射词汇和音素到整数ID，以供OpenFst库处理。OpenFst中的符号表0通常保留给ε（空字符），这是一种特殊的符号。在Kaldi中，创建解码图的典型步骤包括： 1. **符号表准备**：生成包含词汇和音素ID的符号表，例如在例子中，文件`headwords.txt`和`phones.txt`分别包含了特殊符号如<eps>、<s>、</s>等，以及对WSJ任务中特定词汇和音素的标识。 2. **模型构建**：这包括训练声学模型（如GMM-HMM或深度神经网络-HMM），以及语言模型（如n-gram或神经网络语言模型）。这些模型是解码图的基础，它们描述了声学特征与文本序列之间的概率关系。 3. **FST操作**：使用OpenFst工具包，根据训练好的模型构建FST（有限状态转移机），如声学FST（Acoustic FST）、语言FST（Language FST）和联合FST（Combined FST），它们通过串联或并联操作结合不同的模型。 4. **解码图生成**：将多个FST组合成一个解码图，这可能涉及到插入循环移除（Cycle Removal）、后向算法优化等步骤，以减少搜索空间并提高识别效率。 5. **解码**：在实际应用中，输入一段音频，将其转换为声学特征，然后在解码图上执行搜索，找出最可能的文本路径，输出识别结果。 6. **性能评估**：通过WER（Word Error Rate）或其他度量标准来评估识别系统的性能，并根据需要调整模型参数或优化解码策略。对于初次接触Kaldi的开发者，阅读Mohri等人撰写的《基于加权有限状态自动机的语音识别》是一个不错的选择，尽管篇幅较长，但对理解FST（特别是对非熟悉者）至关重要。同时，OpenFst网站提供了更深入的理论背景和实用指导。 Kaldi的解码图创建教程涵盖了语音识别开发的初步流程，从数据预处理到模型训练，再到解码时的具体技术细节，对于希望进入语音识别领域的人员来说，这是一个不可或缺的入门指南。

2022/8/24

Kaldi: Decoding-graph creation recipe (test time)

kaldi-asr.org/doc/graph_recipe_test.html

1/8

Decoding-graph creation recipe (test time)

Here we explain our normal graph creation approach step by step, along with certain data-preparation

stages that are related to it.

Most of the details of this approach are not hardcoded into our tools; we are just explaining how it is

currently being done. If this section is confusing, the best remedy is probably to read "Speech

Recognition with Weighted Finite-State Transducers" by Mohri et al. Be warned: that paper is quite

long, and reading it will take at least a few hours for those not already familiar with FSTs. Another good

resource is the OpenFst website which will provide more context on things like symbol tables.

Preparing the initial symbol tables

We need to prepare the OpenFst symbol tables words.txt and phones.txt. These assign integer id's to all

the words and phones in our system. Note that OpenFst reserves symbol zero for epsilon. An example

of how the symbol tables look for the WSJ task is:

## head words.txt

<eps> 0

!SIL 1

<s> 2

</s> 3

<SPOKEN_NOISE> 4

<UNK> 5

<NOISE> 6

!EXCLAMATION-POINT 7

"CLOSE-QUOTE 8

## tail -2 words.txt

}RIGHT-BRACE 123683

#0 123684

## head data/phones.txt

<eps> 0

SIL 1

SPN 2

NSN 3

AA 4

AA_B 5

The words.txt file contains the single disambiguation symbol "#0" (used for epsilon on the input of

G.fst). This is the last-numbered word in our recipe. Be careful with this if your lexicon contains a word

"#0". The phones.txt file does not contain disambiguation symbols but after creating L.fst we will create

a file phones_disambig.txt that has the disambiguation symbols in (this is just useful for debugging).

Preparing the lexicon L

First we create a lexicon in text format, initially without disambiguation symbols. Our C++ tools will

never interact with this, it will just be used by a script that creates lexicon FST. A small part of our WSJ

lexicon is:

## head data/lexicon.txt

!SIL SIL

下载后可阅读完整内容，剩余7页未读，立即下载

马尚先生

粉丝: 7
资源: 16

Kaldi语音识别入门教程：构建测试时间解码图详解

kaldi语音识别教程

kaldi语音识别资料.rar_kaldi_kaldi pdf 0.7_kaldi资料_语音识别

CSLU_KALDI:使用 kaldi 进行语音识别-开源

kaldi详细资料_kadi语音识别工具_

使用Kaldi与Tensorflow构建中文语音识别系统

latte:基于Barista，Kaldi和gtkmm的在线语音识别引擎

Kaldi 和语音识别

语音识别kaldi安装

kaldi-german:训练Kaldi进行德语语音识别（ASR）的脚本

Kaldi工具助力旁遮普儿童语音识别：小词汇连续语料库与bMMI技术优化

最新资源