深度tensor神经网络：结构改进与大词汇语音识别提升

神经BP网络

4星 · 超过85%的资源需积分: 10 201 浏览量更新于2024-09-10 收藏 1.95MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

深度神经网络（Deep Neural Networks, DNNs）在近年来的研究中展现出强大的潜力，特别是在大规模词汇语音识别领域。本文献《深度张量神经网络及其在大词汇语音识别中的应用》由IEEE高级会员Dong Yu、IEEE会士Li Deng和IEEE成员Frank Seide共同撰写，深入探讨了对传统深度神经网络进行结构改进的方法，即深度张量神经网络（Deep Tensor Neural Network, DTNN）。在传统的深度神经网络中，每一层通常处理输入信号并通过线性变换将其映射到下一层。然而，CD-DNN-HMMs（上下文依赖的深度神经网络隐马尔可夫模型）已经显示出在处理复杂任务时的优势，尤其是在大词汇量语音识别中。DTNN在此基础上进行了创新，主要体现在两个关键层面：首先，DTNN引入了一种双投影（Double Projection, DP）层。在这个层中，每个输入向量被分别投影到两个线性子空间，这使得网络能够捕获更丰富的特征表示。这种结构增加了模型的表达能力，有助于处理更复杂的模式和上下文信息。其次，DTNN包含了一个名为“张量层”的特殊结构，它将两个子空间的投影进行交互，共同预测深度架构中的下一层。这种设计允许不同维度的信息协同工作，增强了网络对数据的建模精度。为了便于训练，论文作者还提出了一种方法，将张量层与传统的sigmoid层进行映射，使其能够在类似的框架下进行学习。这样，DTNN可以看作是常规DNN的一种增强版本，其中融入了DP层作为额外的学习模块。使用反向传播（Backpropagation, BP）算法进行训练，DTNN能够在保持模型性能的同时，显著提升分类精度，特别是在处理大规模语音识别任务时，其优越性更加凸显。本文的核心贡献在于深度张量神经网络的设计与实现，以及如何通过结构优化提升深度学习模型在语音识别领域的表现。这些改进对于推动深度学习在实际应用中的性能提升具有重要意义。

资源详情

资源推荐

388 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013

The Deep Tensor Neural Network With Applications

to Large Vocabulary Speech Recognition

Dong Yu, Senior Member, IEEE,LiDeng, Fellow, IEEE,and FrankSeide, Member, IEEE

Abstract—The recently proposed context-dependent deep

neural network hidden Markov models (CD-DNN-HMMs) have

been proved highly promising for l

arge vocabulary speech recog-

nition. In this paper, we develop a more advanced type of DNN,

which we call the d eep tensor neural network (DTNN). The D TNN

extends the conventional DN

Nbyreplacingoneormoreofits

layers with a double-projection (DP) layer, in which each input

vector is projected into two nonlinear subspaces, and a tensor

layer, in which two subsp

ace p rojections interact with each other

and jointly predict the next layer in the deep architecture. In

addition, we describe an approach to map the tensor layers to

the conventional s igm

oid layers so that the former can be treated

and trained in a similar way to the latter. With this mapping we

can consider a DTNN as the D NN augmented with DP layers so

that not only the B

P learning algorithm of DTNNs can be cleanly

derived but also new types of DTNNs can be more easily devel-

oped. Evaluation on Switchboard tasks indicates that DTNNs can

outperform the

already high-performing DNNs with 4–5% and

3% relative word error reduction, respectively, using 30-hr and

309-hr training sets.

Index Terms—Automatic speech recognition, CD-DNN-HMM,

large vocabu

lary, tensor deep neural networks.

I. INTRODUCTION

ECENTLY, the context-dependent deep neural net-

work h idden Markov model (CD-DNN-HMM) was

developed for large vocabulary speech recognition (LVSR)

and has been successfully applied to a variety of large scale

tasks by a num ber of research groups worldwide [2]–[9]. The

CD-DNN-HMM adopts and extends the earlier artiﬁcial neural

network ( ANN ) HMM hybrid system framework [10]–[12].

In CD-DNN-HMMs, DNNs— mu lti layer perceptrons (MLPs)

with m any hidden layers—replace G aussian mixture models

(GMMs) and directly approximate the em ission probabili-

ties of the tied triphone states (also called senones). In the

ﬁrst set of successful experiments, CD-DNN-HMMs were

shown to achieve 16% [2], [3] and 33% [4]–[6] relative recog-

nition error reduction over strong, d i s criminatively trained

Manuscript received May 29, 2012; revised September 01, 2012 and Oc-

tober 24, 2012; acce pte d November 03, 2012. Date of publication November

15, 2012; date of current v ersion December 10, 2012. This work signiﬁcantly

extends and completes th e preliminary work described in [1]. The associate ed-

itor co or din ating the review of this manuscript and approving it for publica tion

wasMarkJ.F.Gales.

D. Yu and L. Deng are with Microsoft Research, Redmond, WA 98052 USA

(e-mail: dongyu@microsoft.com; deng@microsoft.com).

F. Seide is with Microsoft Research Asia, Beijing 100080, China (e-mail:

fseide@microsoft.com).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Dig

ital Object Identiﬁer 10.1109/TASL.2012.2227738

CD-GMM-H MMs, respectively, on a large-vocabulary voice

search (VS) task [13] and the Switchboard (SWB) phone-call

transcription task [14]. Subsequent work on Google voice

search and YouTube data [7] and on Broadcast News [8], [9]

conﬁrmed the effectiveness of the CD-DNN-H MMs for large

vocabulary speech recognition.

In this work, we extend the DNN to a novel deep tensor neural

network (DTNN) in wh ich one o r more layers are d oub le-pro-

jection (DP) and tensor layers (see Section III for the explana-

tion). The basic idea of the D TNN comes from the mo tiv ation

and assumption that the underlying factors, such as the spoken

words, the speaker identit y, noise and channel distortion , and

so on, which affect the observed acoustic signals of speech can

be factorized and be approximately represented as interactions

between two nonlinear subspaces. This type of multi-way in-

teraction was hypothesized and explored in neuroscience as a

model for the central nervous system [15], which conceptually

features brain function as comprising functional geometries via

metric tensors in the internal central nervous system represen-

tation-spaces, both in sensorimotor and connected manifolds.

In DTNN, we represen t the hid den, underlying factors by pro-

jecting the input onto two separate subspaces through a double-

projection (DP) layer in the ot herwise conventional DN N. We

subsequently model the interactions among these two subspaces

and the output neurons through a tensor with three-way connec-

tions. We propose a novel approach to reduce the tensor layer to

a c on vention al sigmoid layer so that the model can be better

understood and the decoding and learning algorithms can be

cleanly develo ped. Based on this reductio n, we also introduce

alternative types of DTNNs. We empirically compare the con-

ventional DNN and the new DTN N on the MNIST handwritten

digit recognition task and the SWB phone-call transcription task

[14]. The experimental results demonstrate that the DTNN gen -

erally outperforms the c on vention al DN N.

This paper is organized as follows. We brieﬂyreviewthere-

lated w ork in Section II and introd uce the general architecture

of the DTNN in Section III, in which the detailed components

of the DTNN and the forward computations are also described.

Section IV is dedicated to the a lg or ithms we developed in this

work for learning DTNN weight m atrices and tensors. The ex-

perimental results on MNIST digit recognition task and SWB

task are presented and analyzed in Section V . We conclude the

paperinSectionVI.

II. R

ELATED WORK

In re

cent years, an extension from matrix to tensor has been

pro

posed to model three-way interactions and to improve the

下载后可阅读完整内容，剩余8页未读，立即下载

pinkie14104

粉丝: 0
资源: 4

深度tensor神经网络：结构改进与大词汇语音识别提升

使用深度神经网络检测Cassini ISS图像中圆盘状星体轮廓.pdf

基于深度神经网络的股票预测系统.pdf

深度神经网络是什么和是怎样发现深度神经网络

深度神经网络与浅层神经网络

python深度神经网络

bp神经网络与深度神经网络

深度神经网络包括什么？深度神经网络有什么关键步骤？深度神经网络有什么技术难点？

pytorch深度神经网络

深度神经网络 matlab

深度神经网络和传统神经网络的关系

神经网络与深度神经网络的联系与区别

python 深度神经网络 功率预测

深度神经网络模型介绍

ＨＡＭＮＮ深度神经网络

深度神经网络优点与局限性

卷积神经网络和深度神经网络的区别

什么是深度神经网络模型？

深度神经网络适用于什么场景

matalb 深度神经网络工具箱下载

matlab深度神经网络

最新资源

python 深度神经网络功率预测