没有合适的资源?快使用搜索试试~ 我知道了~
首页深度学习IEEE最新论文
深度学习IEEE最新论文
需积分: 0 7 下载量 77 浏览量
更新于2023-05-27
1
收藏 2.35MB PDF 举报
语音识别主要作用就是把一段语音信号转换成相对应的文本信息,系统主要由声学特征提取、语言模型、声学模型和解码器等组成。训练识别的过程是从原始波形语音数据中提取的声学特征经过训练得到声学模型,与发声词典、语言模型组成网络,对新来的语音提取特征,经过声学模型表示,通过维特比解码得出识别结果。
资源详情
资源推荐
3 96
IE EE /
C
A A
J OU R N AL OFAU TO
M
ATI C A S IN I CA ,
VO
L
.
4
,
NO . 3
,
JU LY 2 0 1 7
Rec e n
t
P
ro
g
r e s s e s
i
n
D
e e
p
L
e ar
n
i
n
g
B
a s e
d
A
co
u
s
ti
c
M
odel s
D
o
n
g
YuandJi
n
y
u
Li
Abs tra c t
—
I
n
t
his
p
a
p
e r
,
w
e s u
m
m
ari ze re ce n
t
p
r
o
g
r e ss e s
m
ade
m
o
d
e
l s c anb et t er e
x
p
lo i t cont e xt
u
a
li
n
f
or
m
a t ion t han fe ed
-
in dee
p
l e ar
n
i
n
g
bas ed
ac o
u
s tic
m
o de ls a
n
d
t
he
m
o tiva tio
n
a
n
d
f
or
w
ar
d de e
p
ne ura l ne t
w
orks
(
D
N
N
s
)
a n
d t hu sl e ad t one
w
i
n
s i
g
h
t
s behi
n
d
t
he
s
u
rv e
y
e
d
tec h
n
i
q
u
es .
W
ef ir
s t di sc
u
s s
m
odel s
s ta te
-
o f
-
t he
-
art re co
g
n
i t i o n ac c u r ac
y
. I
n
S
e
c ti on II I
w
e de s cr ibe
s uch as
r e c
u
rre
n
t
n
e
u
r a
l
n
et
w
o rks
〇
lN N s
)
a nd
convo lut i o
n
a l
ac 〇us t
i
c
m
od e ls t h a t ar ede s
i
g
ned an
d
o
p
t
i
m
ize de n d
-
t o
-
en d
ne
u
ral
n
et
w
or k s
(
CNNs
)
t
h
a tca n
e ff e c t iv el
y
e x
p
l o it va ria ble
-
,
u
i
*
w
j
^
l e
n
g
t
hc o
n
tex
t
u
al i
n
fo r
m
a tio
n
,
a nd
t
hei r
va rio
u
s
co
m
b
i
n
a ti o n
w
i t
h
n
o or l e s s non
-
l ea rn
-
a b
l
ec o
m
p
on ent
s .
W
e fi rs t d i s cu s s
w
i th o
t
her
m
o de ls .
W
e t
h
e
n
de s c ribe
m
o d
e
l
s t
h
at a reo
p
ti
m
i z ed
t h e
m
od e l s in
w
hi cha u di o
w
av e
f
or
m
s are
di re ctl
y
us eda s th e
e
n
d
-
t
o
-
e
n
da
n
de
m
p
has ize
o
n
fe a
t
u
r e
re
p
res e
n
tat i ons le a r n e d
i n
p
ut f eatu r es o th at t he
f
e a t u re re
p
re s
en t at i on l a
y
er i s aut o
m
at
-
j
o i
n
t
l
y
w
i th
t
he re
s to f the s
y
ste
m
,
t
he c o
n
n
e c ti o
n
i st te
m
p
o r al
i c a l l
y
l ear ned in st e ado
f
m
anu
a ll
y
des i
g
ned .
W
e the n de
p
i
ct
c l as s if ic a t io
n
(
C TC
)
c r i t e r io
n
,
a
n
dth ea
tt
e
n
t
i o
n
-
b
a s eds e
q
u
e
n
c e
-
m
oc j e l s t ha tar eo
p
t
i
m
i
ze
du s i n
g
t h ec on n e ct io ni s t t e
m
p
o ra l
to
-
s
e
q
u
e
n
c e tra ns
l
a ti on
m
ode l .
W
ef
u
r
t
h e r il l u
s t r a te ro b
u
s
t
n
ess
,
广
〒
〇
、
、
?
u
.
u
以
is su
e
s i
n
s
p
e
e
c
h
r e co
g
n
i ti on s
y
ste
m
s
,
a
n
d dis cu ss ac ous
t
ic
m
o del
c l as s i fic at i on
(
C
T
C
)
c
n
t e
n
on
w
hi c h a l lo
w
s
a
s e
q
ue n c e
-
to
-
ada
p
t at
i
on
,
s
p
ee c
h
e nhanc e
m
e n t a n ds e
p
a ra
t
io n
,
a
n
dr ob
u
s
t
se
q
u
enc e di rec t
m
a
pp
i
n
g
. F
o
l l o
w
i n
g
t h at
w
e
an al
y
ze s e
q
u en
c e
?
t r ai
n
i n
g
s
t
ra
t
e
g
i es
.
W
e al s o c ov e r
m
o de l i
n
g
te c
h
n
i
q
ue s tha t l ead
to
-
se
q
ue n ce t ra n s l at io n
m
ode
l
s t
h
at
areb u i l t u
p
on t he atte n t i o n
t
o
m
o r e e
ffi
ci e
n
t
d
e codi
n
g
a
n
d dis c u ss
p
o ss i
b
le fut
u
re di r e c
t
io
n
s
m
e ch ani s
m
.
W
e dev
o
te
S
e ct
i o n I Vto d i s c u ss t ech ni
q
u es t
h
a t
i n
a
c
o
us
t
i
c
m
o
d
e
l
re se ar ch .
ca ni
m
p
ro ve r obus tn e s s
w
i
t
h fo c u se s o n ad a
p
t at i o nt e chn i
q
u e
s
,
I
n
de
x
Te r
m
s
—
At te n
t
i o
n
m
odel ,
co
n
vo lu ti o
n
al
n
e
u
ra l
n
e
t
w
o r k
s
p
ee che n ha n ce
m
en ta n
d
s
e
p
a rati on tec h ni
q
u es
,
a n dro b u
s
t
(
CNN
) ,
c o
n
n
ecti o
n
i s
tt
e
m
p
o ral
c
l
a ss ifi ca ti o
n
(
CTC
) ,
d
e e
p
l
ea r n in
g
t ra i nin
g
t e chn
i
q
ue s
.I
n
S e
c
ti on V
w
ede scri bea co us t
i
c
m
o
d el s
(
D
L
)
,
l o
n
g
s h
o
rt
-
t e r
m
m
e
m
o r
y (
L
ST
M
)
, p
er
m
u ta
t
io ni
n
v a
r
i
a
n
t
t hat s u
e
ffi
d
e n t
de c〇 d i n
an dc 〇ver fra
m
e
.
s k
i
pp
i
n
g
an d
t
r
ai
n
i
n
g
,
s
p
e ec h ada
p
t
a
t
io
n
,
s
p
e
e c h
p
r oce s s in
g
,
s
p
e ech
re c o
g
i
u
-
F
F
.
u
.
,
*
a
^
tio n
,
s
p
eec h se
p
a r a
t
i o
n
.
m
ode l c o
m
p
r
e
s s i o
n thr ou
g
hte ac he r
-
s t ud e nt t ra i ni n
g
a n d
q
u
an
-
ti z ati on
.
W
e
p
r
o
p
os ec o re
p
rob l e
m
st o
w
o r
k
o na n
d
p
o
t
e n ti a l
fu t u re di re ct i on s
i
n so
l
v
i n
g
t he
m
i nSe c t io nVI
.
I . I n tr o d
u ct i o n
I
N
t h e
p
as t sev e ral
y
e ars
,
the r
e
ha s b een si
g
n i fi c ant
p
ro
g
r es s
I
I . A
co u s
t i c
M
o de lsE x p l o i ti n g
V
a ri a b l e
-
l
e n g t h
i na
u
to
m
ati c
s
p
e ec hre co
g
n
i t i on
(
AS R
) [
1
]
-
[
2 1
]
.
T
h
e se
C ONT E XT UAL I N F O R
M
AT I
O
N
p
ro
g
r e s s es ha ve l e dt o ASR s
y
s
t
e
m
s t hat s ur
p
a s
s
ed t het
h
r es
h
-
Th
e
DL/H
M M
h
y
bri d
m
od el
[
1
]
—
[
5
]
i s
t he fi rs tdee
p
old
forado
p
t io n i n
m
an
y
re al
-
w
o r l d sc e nari os
an
d
en
abl ed
l
e ar
n
i
n
g
arc hi t ec t ur e tha ts u cc ee
d
e
di
n
AS Randi ss t i l l th e
se rv i c es s
u
ch a sGoo
g
l
e
N
o
w
,
M
ic ro soft Cort a n a
,
an
dA
m
azo n
do
m
i na n t
m
od e l u sedi n i ndus t r
y
.
Se ve r
al
y
ear s
a
g
o
,
m
os t
Al exa .
M
an
y
o
f
t
h
e
s
e
ac hi ev e
m
e n t s
a
re
p
o
w
e re
d b
y
de e
p
h
y
b
r
i ds
y
s
t e
m
s are DNNba se
d. A
s
re
p
o
rt
e di n
[
3
] ,
o ne
of
l e ar ni n
g(
D
L
)
t e c hn
i
q
u
es . Re ade rs
are
r e ferr e
d
t o
Y
u
a nd
t
he i
m
p
o
rta nt fa ct ors tha t le a
d
t o s
u
p
e
ri
o r
p
e rf or
m
a
n
c e in the
De n
g
20 1 4
[
22
]
f
or a co
m
p
r e
h
e
ns i ve s u
m
m
ar
y
a
n dde ta i l e d
DN
N
/H
M
M
h
y
brid s
y
s t e
m
i si tsa bi l it
y
to
e x
p
l oi t c o n t ext u al
de s cri
p
ti on
oft h
e
t
ec hn ol o
gy
ad va nce
m
en t s
i
n
A SR
m
ade
i n for
m
at i on .I n
m
o s t s
y
st e
m
s ,
a
w
ind o
w
of 9 t o1 3fr a
m
es
be fo re 2 0 1 5 .
(
l eft/ri
g
h t c on te x t o
f 4
—
6 fr a
m
e s
)
of fea t u re sar eus ed a s t he
I n th i s
p
a
p
er
,
w
e s ur v e
y
n
e
w
de vel o
p
m
en t s
ha
pp
e
n
edi n
i n
p
u
tt ot heDNNs
y
st e
m
t o
ex
p
loi t th e
i n f or
m
ati onfr o
m
the
p
a s t t
w
o
y
ear s
w
i
t
ha n
e
m
p
ha s is
on
ac o u
st i c
m
od el s
,
ne i
g
hb ori n
g
f
r a
m
e s t o
i
m
p
r
ov e the ac c u rac
y
.
W
e di s c uss
m
ot i
va
t
i on s
an dc or ei d ea s
o
f
e a
c h i nt er es ti n
g
H o
w
ev er
,
t
h
eo
p
t
i
m
al l
en
g
t ho f c o n t e x t ual i n fo r
m
a t io n
m
a
y
w
o
r
k
sur ve
y
e
d.
M
ore s
p
e c ifi c a l
l
y,
i n S ec
t
i
o n
II
w
ei l l us tr at e
va r
y
for di ffer ent
p
hone s an
ds
p
e ak i n
g
s
p
e ed. Thi
s
i n di c
a
t
e s
i
m
p
ro v ed
D
L/H
M
M
(
h
i
d
de n
M
a
rkov
m
o
de l
)
h
y
bri d ac ous ti c
tha t us in
g
fixe d
-
l en
g
t h co n t
e
x
t
w
i ndo
w
,
a s
i n t he
D
N N/H
M
M
m
ode ls th a t e
m
p
l o
y
d ee
p
re cu rr en t n e ura l n et
w
or
ks
(
R
N
N s
)
h
y
brid s
y
s t e
m
,
m
a
y
no t
b
e t
he bes t c h o i c e t o
e x
p
l oi t c on te x
t
u
a
l
a n d de e
p
co nvol u t
i
on a
l
neur a
l n e t
w
ork s
(
CNNs
)
.
The s e h
y
brid
i nfor
m
at i o n. I n r ec e n t
y
e ar s
p
e o
p
l e h av e
p
r o
p
os ed ne
w
m
o
d
e
l
s
?
^
」
t h atc an ex
p
l
o
i
tv a
ri
a
b l
e
-
l en
g
t hc ont ext u a l i n fo r
m
a t io n
m
o r e
M
a nu s c ri
p
t
r e ce i v ed
A
p
ri l 1 1 ,
2 0 1 7
;
acc e
p
t ed
M
a
y
24
,
2 0 1 7 . Re c
o
m
m
e n
d
e
d
?
t
^
o
mM
c
b
y
A
s s oc i a te
E
d i to r
Q
i n
g
l a i
W
e
i
.
(
C
o r
r
e s
p
o n d
i
n
g
a
u th o
r
D
o n
g
Yu
.
)
e ffe c t ive l
y
.
Th
e
m
os
t
i
m
p
or t an t t
w
o
m
ode l s u se de e
p
R
NN s
Ci tat i o n : D . Yu
a nd J . L i
,
“
R
e ce n
t
p
ro
g
r e s s e si n d ee
p
l
earn
i
n
g
b
a s e
d
a n dC N
N
s .
ac
o
u s t i c
m
o d e l s
, I
EEE/CAA J
.
o
f
Au t
o
m
.
S
i
n
i
ca
,
vol . 4
,
n o
.
3 ,
p
p
.
3 9 6
—
40 9 ,
J
u
l
. 2 0 1 7 .
D
.
Y
u
i
s
w
i
t
h
t
h
e Tenc ent AIL
a
b ,
T
en c ent
,
B
ell ev u e
,
W
A9 8 0 3 4 ,
US A
A
.
Re cu r re n t
N
e u ra
l
N
e t
w
o
rks
(
e
-
m
ai l : d on
gy
u @i e e e
.
o r
g
)
.
,
,
。
.
亡
j
J
.
L i i
s
w
i
t
h
t he
M
i c r os o ft AI
an
d R
e se arc h
,
M
i cro s o ft
,
Re d
m
m
o nd ,
W
A
Fe ed
-
for
w
a rdD
N
N
s o nl
y
c on si
d
er i n
f
or
m
a ti on i n a
t
i x e
a
-
9 8
0 5 2
,
US
A
(
e
-
m
ail :
j
i n
y
l i @
m
i cr o
s
o ft
.
co
m
)
.
l en
g
t h s l idi n
g
w
i
n
d
o
w
o
f f
r
a
m
es a
n
d t h u s can not ex
p
loi t l o
n
g
-
C ol o r ve r s i o n s of on e
or
m
or e o
ft h
e fi
g
u r es i n t h i s
p
a
p
er
a re ava
i l
a
b l
e
^
c〇r rel a t
i
〇n s
i
n
t h
e
s
p
ee chs i
g
n aL〇n the ot
h
e r
h an d
,
on l i n e a t h tt
p
: // i ee e x
p
I o r e. i e ee . or
g
.
,
.
.
t
.
.
i
*
*
Di
g
i t a l O b
j
e ct id e nt i fi er i 〇
.
i 1 0 9 /J A S
. 20 1 7 . 7 5 1 050 8
R
N
N
sc an enc ode
s e
q
u
e
n
cehi s t or
y
m
t he ir i nt e
rn
a l s t
a
t
es
,
Y U
A
ND L I : RE
C EN T P
R
O
G
R
ES S E S I N D E
E
PLE AR NI NG BA S E DA CO U S TI C
M
O DELS
3 9 7
an dt h u sh a ve t h e
p
ote nti a l t o
p
redi c t
p
h
on e
m
e
s
ba se
d
ona
l l
t
h
a t e vo
l
v e ov er t
i
m
ea nd
f
re
q
u en c
y
to
p
re
di
ct
p
h
on e
m
e
s
. Thi s
the s
p
ee ch fe at ure s ob s e r
v
e
d
u
p
t o t
h
ec ur r
e
n
t f r
a
m
e
. U
n
for
-
i
n s
p
i
re
d
t
he
p
r
o
p
o s a
l
o
f
a
2
-
D
,
t
i
m
e
-
f
re
q
u
e n c
y (
TF
)
LS T
M
t u n atel
y
,
si
m
p
l e R
N
Ns
,
d e
p
e n
di
n
g
on t
h
e
l
a
r
g
e s
t e
i
g
e n
v
al u e
[
3 5
] , [
3 6
]
w
hi ch
j
o
i
nt
l
y
s can
s
t
h
e s
p
e ec
hi n
p
u t
o
v er t h e t i
m
e
o
f
t h es t at e
-
u
p
d
ate
m
at r
i
x
,
m
a
y
h
a ve
g
r
adi e nt s
w
hi c h ei the r
a
n
df
r
e
q
ue n c
y
a
x e s
t
o
m
odel s
p
e ct ro
-
t
e
m
p
oral
w
ar
p
i n
g ,
a n d
i
nc reas e or
d
ec re ase ex
p
one n
ti al l
y
ov e
r t i
m
e
.
He nc e
,
t he b as i c
t h e n u ses th eo u t
p
u
t a
c
ti
v
at i on
as t
h
e
i
n
p
u
t t o t he tr ad i ti on al
R
N
Ns a redi ff ic u
l
t t o t ra
i
n
,
an
d
i n
p
r act
i
ce
c an
on
l
y
m
o
de l
ti
m
e
LS T
M
.
T
h e
j
o
i
nt
t
i
m
e
-
f
re
q
u enc
y
m
o
de
l
i
n
g p
r
o
v
ide s
s h ort
-
r an
g
e e
ff
ec t s
.
b
et
t
e r n
o r
m
al i ze df
e atur es
f
or t
h
e u
pp
e r
l a
y
er
t
i
m
e LST
M
s .
Lon
g
s
h
ort
-
t e
r
m
m
e
m
or
y (
L ST
M
)
RN
N
s
[
23
]
w
ere dev el
-
T hi s ha s bee n
v
e
rifi
e
d
e
ffe c
t
iv
e
an
d
r
ob u st t o di s
t
ort io na t
o
p
e dt o o ver c
o
m
e
t
h
es
e
p
robl e
m
s .
L ST
M
-
RN N s
u s ei n
p
u t
,
b oth
M
i c ro s oft
a
n
d G
o
o
g
l
eo n
l
ar
g
e
-
s
ca le t as k s
[
3 5
]
—
[
3 7
]
.
ou t
p
u t a n
d for
g
e t
g
at es toc on tr ol i nfor
m
a t ionfl o
w
s ot ha t
Hi
g
h
w
a
y
LST
M
h as
g
a tes o nbot ht he t e
m
p
oral a n d s
p
at i al
g
r
a
di
en
t s c a n be
p
r o
p
a
g
at e d i n a s t abl efas hi on
o
ver re l at iv el
y
d ire ct i on s
w
hi l e TFLST
M
ha s
g
a t es on b ot h th e t e
m
p
oral a n d
l o n
g
e
r
s
p
an
of t i
m
e . The s e
ne t
w
ork s hav e bee n
s h o
w
n t o
s
p
e c t ra ld i re c ti on s
.
I t is de si rabl et o h av ea
g
e
n
er a lLS T
M
ou t
p
er fo r
m
DNN so n a va ri et
y
ofAS Rt as ks
[
8
] ,[
2 4
]
—
[
27
]
.
s truc t ure th
at
w
or
k
s
al
o
n
g
al l
d
i
rec
t
i
o
ns
.
G ri dLS T
M
[
3 8
]
N
o te t
h
a t t
h
e
re i s
a
n
o t her
p
o
p
u l ar R
NN
m
od el
,
ca l led
g
at ed
i ss u ch a
g
en er al L ST
M
w
hi char ra n
g
es the LST
M
m
e
m
or
y
r e
c
u
rre n
t
un
i t
(
GRU
) ,
w
hic h i ss i
m
p
l e r th an LST
M
b u t i sa l so
c e ll si ntoa
m
u l t i di
m
e n s i on al
g
ri d . I t c a n bec
o
n s i de re das
a b le
t
o
m
o de l the l o n
g
s h or t
-
t er
m
cor rel at i o n
.
A l tho u
g
h GRU
a
un i fi ed
w
a
y
o fu si n
g
LST
M
for t e
m
p
or al
,
s
p
ec tra l
,
a n d
ha s be ens h o
w
n ef fe c t iv e in
s e v e ral
m
ach i n e l ear ni n
g
t a s ks
s
p
a ti al c o
m
p
u ta ti o n .G ri dL ST
M
has be e n s t ud i e d fo r t e
m
p
or
al
[
2 8
]
,
i t i s n ot
w
ide l
y
us ed i n ASR t as ks .
a n ds
p
a ti al co
m
p
u
t at i on i n
[
39
]
a nd t e
m
p
or al a nd s
p
e c tr a
l
A
t t h e t i
m
e st e
p
t
,
t hev ec tor
f
or
m
u l a s o
f
t
h
e c o
m
p
u
t at
i
on
c o
m
p
ut at i oni n
[
3 7
]
.
o f L S T
M
u nit s c an be de sc
ri
bed a s :
A
l
t
h
ou
g
h bi
-
di rec ti o nal L
S
T
M
s
(
B L
S
T
M
s
)
p
e r
f
or
m
be tte r
i
t
=
a
(
W
i
x
x
t
+
W
i h
h
t
-
i
+
P
i
(D
c
t
-
i
+
b
i )
(
l
a
)
t
h
a
n
u ni
-
dir ec t io n
a
lL S T
M
sb
y
u si n
g
t he
p
a
s
t
and
fu tur
e
-
/
Tx r
,
jj
r
r
,
八
,
, s
, 1
U
、
c o n tex t
i
n
f
or
m
at i on
[
8
] ,
[
40
]
,
the
y
ar en ot s u
i
t ab
l
e
f
or re a
l
-
t i
m
e
f
t
=
cr
{
W
f
x
x
t
+
W
f
h
h
t
-
i
-
\
 ̄
P
f
0
C
t
_
i
-
\
-
〇
f
)
(
l b
)
.
.
.
,
i
“
?
山
u
i
J
s
y
st e
m
ss i ncet he re co
g
ni t i on c a n ha
pp
e n onl
y
aft e r t he
w
h
o
l
e
〇
t
=
f
t
〇
C
t
-
i
+
^
O
<
I
>
{
W
c
x
X
t
+
W
c
h
h
t
-
i
+
b
c )
(
l c
)
u tt er an c e ha sbe en
obs er v ed . For thi s
rea s on
,
m
o de l s
,
su ch
+
p
0
?
c
t
+
&
〇
)
(
I d
)
a s
l
ate n c
y
-
c
o
n
t
ro
l
l ed BL
S
T
M
(
L
C
-
B L
S
T
M
) [
2
9
]
an
d r o
w
-
h
t
=
o
t
Q
<
p (
c
t
)
(
l e
)
c o n vo
l
u
t i
on
B LS
T
M
(
R
C
-
B
LST
M
)
,
t
h
a t
b rid
g
e b
et
w
e e n un i
?
d ire c
ti
on a
l
L
S
T
M
san d BL ST
M
s hav e be en
p
r
o
p
o s ed
.
I
nt he s e
w
h
ere
X
t
i
s
th e i
n
p
u t v
ec t o r. Th eve ct or si
t ,
〇
u
ft
t h e
m
o
de l
s
,
t h ef or
w
ar
dLS T
M
i
s s t
i
l
l k
e
p
t
as
is . Ho
w
e ve r
,
t he
ac
t
i
v at
i
o no
f th
e i n
p
u t
,
ou
t
p
u
t
,
a
n
d
f
or
g
et
g
a t
e
s
,
re s
p
ec t i v
e
l
y
,
b a
ck
w
ardLS T
M
i
sr e
p
l
a c e
d b
y
e
i
t
h
e r a
b
a
c k
w
a
r
d LST
M
w
i t h
Th
e
W
.
x
a
n
d
W
t er
m
s
are t he
w
e
i
g
ht
m
at ri c es for t h ei n
p
ut s
a t
m
os t
A
^
-
f
r
a
m
e so f l o o kahe ad a si n th e L
C
-
B LS T
M
c a se
,
x
t
andt
h
e r ec u
rre
nt
i
n
p
u
ts
r
e s
p
e ct i ve l
y
. The
p
〇
i p
j
〇 r aro
w
-
c
o
nvo l
u t
i
ono
p
e ra t
i
on t hat i n t e
g
ra te
s i
n
f
or
m
a
t
io n i n
a
re
p
ar a
m
e te r v
e c
t or s
a s s o ci a te d
w
i t h
p
ee
p
ho l e c onn e c ti o n s ,
t h
e
A
^
-
fra
m
e s o f l
oo
k
ahe a
d . B
y
c ar e
f
u
ll
y
c hoo s i
n
g
N
w
ec an
Th
e
f
u n c
t io
n
s
a
a
n
d
(
j
)
are t hel o
g
i s t ic s i
g
m
oi da nd h
yp
e rbo l i c
b
al an
c ebe t
w
ee
n re co
g
n i t
i
on ac cur ac
y
an dl
a te
nc
y
. R
e c e n t l
y,
t a
n
g
ent n
o
n
l in e a ri
t
y
,
r
es
p
e ct i ve l
y
. The o
p
er at i o n
?
re
p
res e nt s
L C
-
B LS T
M
w
a
s
i
m
p
r
o
v
e d i
n
[
4
1
]
to s
p
ee du
p
t
he eval u a ti on
e l
e
m
en t
-
w
i s e
m
ul ti
p
l i c at i on of v ec tors .
a n dt o e n abl e r e a l
-
t
i
m
e o nl in e
s
p
eec
hrec o
g
n i
t
i on b
y
u s in
g
It i s
p
o
p
u l ar t o s t ack LS T
M
l a
y
e rs t o
g
e t be t t er
m
o de li n
g
b et te r ne t
w
ork to
p
ol o
gy
t oi n i t i a
l
i
z
e t
h
e B LS T
M
m
e
m
o r
y
ce l l
p
o
w
er
[
8
]
. Ho
w
e
ver
,
a
nL
S
T
M
-
R
NN
w
i th
t
oo
m
an
y
va n
i l l
a
s
t
a
t
e s
LST
M
l a
y
e rs i sv e r
y
h a r dt ot rai nand t h ere s t i l l exi s t s t he
g
ra di en t v an i s h i n
g
i s s ue if thene t
w
ork
g
oes t oo dee
p
.
Th
i
s
i s s
u
e ca n be so l ve d b
y
u s i n
g
e i the r hi
g
h
w
a
y
L
S
T
M
o rr esi d u a
l
LS T
M
.
B.C
o
n vo
l
u t
i
o na
l
N
e u r a
l
N
et
w
or
k
s
I n t h e h i
g
h
w
a
y
LS T
M
[
2 9
] ,
m
e
m
or
y
c e
l l
s
of a d
j
ac
e
nt
la
y
er s
ar e
co nn ec t ed
b
y g
at edd
i
re ct
l i
n
k
s
w
hic h
p
r o
v
i de
a
p
at h
An o th e r
m
ode l
t
ha
t
c an effe
ct
i
ve
l
y
e x
p
l oi t va ri a b le
-
l en
g
th
f
or i nfor
m
at
i
on t o
fl
o
w
b
et
w
e en l
a
y
er s
m
ore di r e
c t
l
y
w
i t hou t
c o n te xt u al i nf o r
m
ati o
n
is the
c
onvo l u t i o na l n eur al ne t
w
ork
d
e
ca
y
. Th er
e
f
o
r e
,
i
t a
l l
ev
i
at es t
he
g
ra
d i
e
nt v an
i s
h
i
n
g
i ss ue an d
(
CN
N
) [
42
] ,
i
n
the c ente r of
w
hi c hi s t he c onv o l u t i o no
p
e ra t i o n
e nab les the t r a
i
n
i
n
g
of
m
u c
hd
ee
p
er L
ST
M
-
R
NN
ne t
w
or
ks .
(
orl a
y
er
)
. Th ei n
p
ut t
o
t h
e co
nvol u t
i
o
n o
p
e ra
t
i
o
ni s us u al l
y
Re si du a lL
S
T
M
[
3 0
]
, [
3 1
]
u
s
e s
s
h
or
t c
u
t con
ne
ct i on s be
-
a t hre e
-
di
m
e n s i on al te n s o r
(
ro
w
,
c o
l
u
m
n
,
c ha nn e l
)
for s
p
e ec h
t
w
ee nL ST
M
l
a
y
e rs
,
an
d h
e
n ce a l
s o
p
ro
v
i des a
w
a
y
t o al l ev i
-
r ec o
g
ni ti o n but c an be l o
w
er or hi
g
he
r
d i
m
en si o n alt e ns ors fo r
ati n
g
t he
g
ra
di
e nt v an
i
s
hi
n
g
p
r o bl
e
m
. Di ffer en t fro
m
hi
g
h
w
a
y
o
t
her a
pp
l i c a t i on s . Eac h cha n
n
e l of
t
hei n
p
u
t
a nd
o
u
t
p
u t of t he
L
ST
M
w
h
i
c
h
us
e s
g
a te s t
o
g
u
i de t h
e
i nf
o
r
m
ati o n flo
w
,
res i d ua l
c o n vol u t i ono
p
era t i on c a n be c o
ns
i d
e
re d as a vi e
w
of t h es a
m
e
L
ST
M
i
s
m
o re
s t
ra
i
g
ht fo r
w
ar d
w
i th t he d ir ec t sho r tc u t
p
at h
,
d at a. I n
m
os t set u
p
s
,
a l l ch ann el s hav e t h e s a
m
es i z e
(
h e i
g
ht
,
s
i
m
i l
art o
R
e
s i du
a
l C N
N
[
3 2
]
w
h i c h rec e nt l
y
a chi ev es
g
re a t
w
i dt h
)
.
s u cc es s
i
n t
h
e
i
m
a
g
e
cl as s i fi c a ti on t as k.
Th e
f i
l t
e
r s i n t he co nv ol ut io
n
o
p
era ti on ar eca l l ed
k
erne l s
,
T
yp
i
c a
l l
y
,
l
o
g
M
e
l
-
fi l
t
e r
-
b a nk fe at u re s are of te n u s ed as t h e
w
hi c ha refour
-
di
m
e n s i on al t e n s
o
r s
(
ke r
n
el h ei
g
ht
,
ke rne l
i
n
p
ut to t
h
e ne u
ral
-
net
w
o
r
k
-
bas ed ac o u s ti c
m
o del s
[
3 3
] , [
3 4
]
.
w
i dt h
,
i n
p
utc h a nn el
,
o ut
p
ut c h
an
n
e
l
)
i n our ca se
.
There are
S
w
i
tc
h i
n
g
t
w
ofi l
t er
-
b
an
k b in s
w
il l no t af fec tt he
p
er fo r
m
a nc e
i n t ot a l
C
x
x
C
v
ke rne l s
,
w
he
re C
x
i st he nu
m
ber of i n
p
ut
o f
t
h
e
D NN o r LST
M
. Ho
w
ev er
,
th i s i sno t th e
ca s e
w
h en a
c han ne l s a nd C
v
i sth e nu
m
ber of o ut
p
ut ch a nn el s
.
Th e ke r nel s
hu
m
an re
ad sa s
p
e
c
t
r o
g
r a
m
: a h u
m
an re l i e so nbot h
p
at t e rn s
a re a
pp
l i e d t o loc al r e
g
i o n s c al l ed re ce
p
t iv e f i el ds i n a ni n
p
u t
3 98
I E EE/C AA JO UR NA LO F AU TO
M
ATI C AS I N I C A
,
V
O
L
.
4
,
N
O
.
3
,
J UL
Y 20
1 7
i
m
a
g
e
a l o n
g
al l c han n e l s
.
T he
v al ue
after t
h
e
c o
nv ol u
t
i o
n
i n fo r
m
a ti o n
si nc eCN
N
s
i n
t h es e
m
od el s
on l
y
de al
w
it h
o
p
er at i o n is
fre
q
uenc
y
-
a xis v ari abi l it
y
.
C
LDNNa n d
C
DLb o t
h
ac h
i
ev ed
addi t i on al acc ur ac
y
i
m
p
rove
m
e nt o
v
er CN
N
-
D
NN
m
odel
s .
v
i
j
e (
K
,
X
)
=
/
j
v
e c
(
K
n
e )
?
v
e c
(
X
j
j
n
)
(
2
)
R
e s ea rc
h
er s
q
u i c kl
y
r ea li z ed
t
hat dea li n
g
w
i t hva riab le
-
7
1
l en
g
t hu t te ra nce i s di f fer e n t fro
m
e x
p
l oi t i n
g
vari ab l e
-
l en
g
t h
f
or e ac
h
ou t
p
u tc
ha
n n
e l
£
a
n di
n
p
u t sl i ce
(
i
, j ) (
t he i th s te
p
c o nt
e
x tu
a l
i nfo r
m
a
ti
o
n . T D
N
N s
,
w
hi ch c onvol v eal on
g
bot h
a
l on
g
t
h
e v er ti
ca ld i
r
ec
t
i o na n d
j
t h s te
p
al o n
g
th e horiz on t al
t h e
fr e
q
ue n
c
y
a n d ti
m
e ax es a n dt h u s ex
p
l oi t vari abl e
-
l en
g
t h
d i
r ec
t i
on
) ,
v/ h
ere
of s i z e
(
H
fc ,
W
fc
)
i s aker ne
l
m
atr
i
x
c on
t e x t
u
al
i nfor
m
atio n
,
a tt ra cte dne
w
a tt e n t io n s
,
thi s t i
m
e
a
s s o c
i
a
te d
w
i
t
h i n
p
ut c h an ne l
n
a nd out
p
ut c h a n ne l l and ha
s
un
der t h e DL/H
MM
h
y
bri da rc hite ct u re
[
1 3
]
, [
47
]
an d
w
i t h
t
he s a
m
e s i
z
eas
t
he i n
p
u t i
m
a
g
e
p
a t c h
X
i
j
n
of ch a nne l
n
,
v aria t io n s s uc ha sr o
w
c o nv ol u t io n
[
1 5
]
a nd fee dfor
w
ard se
-
v
e
c
(
-
)
i s t he v ec t or for
m
edb
y
s ta cki n
g
a l l the c
o
l
u
m
ns o
f t he
q
u e n ti a l
m
e
m
or
y
ne t
w
o rk
(
F S
M
N
) [
1 6
]
. S i
m
i l ar t o the o ri
g
i
na l
m
atri x
,
and
?
i s
t he in ne r
p
r o du c tof t
w
o vec t or s . It i s
o
b v
i
ou
s
TD N
N
s
,
th es e
m
od el s
s tac k se veral CNN la
y
e r s
al o n
g
t h e
th a
t
e a chou
t
p
u t
p
i x e
l
i s a
w
e i
g
ht e ds u
m
ofa l l
p
i xe
l
sa cro s
s
fre
q
ue nc
y
a
n dt i
m
e
-
ax i s
,
w
i t h afoc u s o nt he t i
m
e
-
a xi s
,
t o
a ll c h
a
n n el
s
i
n
a n i n
p
u t
p
a tc h . S i nc e e achi n
p
ut
p
i xe
l
c an b
e
a cc o unt f ors
p
e aki n
g
ra te v ariat i o
n
.
B u t unl i ke t he o ri
g
i n a l
c on s i de reda
s
a
w
ea k
p
a t t ern de te ct or
,
e ac ho
u
t
p
ut
p
i x el i s
TDN
N
s
,
the T DNN/ H
M
M
h
y
brid s
y
s te
m
s c an rec o
g
n i zel ar
g
e
j
u
s
t
a bo os
t
e d det e c
t
o r ex
p
l oi
t
i n
g
al l inf or
m
ati o n i n
t he i n
p
ut
vo c abul ar
y
c o nt i n u o
u
ss
p
e e ch v er
y
e ffec t iv el
y
,
p
a tc h.
M
or e
r e
c e
nt
l
y
, p
ri
m
a
r
il
y
m
oti v at e db
y
t he
s uc ce s se s i n
Th
e
ke
rn e
l is
s h ar edac ro s s al l i n
p
u t
p
a tc h e s
an d
m
ov e s
i
m
a
g
ere co
g
ni t i o
n
,
va ri o
u s
ar chi t
e
ctu res of de e
p
C
N
N s
[
1 4
]
,
a l on
g
t
he i n
p
u t
i
m
a
g
e
w
i t h st rid e s
S
r
a n d
S
c
a t the ve rt i c a
l
[
1 7
]
,
[
46
]
, [
48
]
h
av eb e e n
p
r o
p
o s ed an
d
ev al u a
t
ed fo r ASR .
a nd h
o
ri zo nt al di r ec t ion
,
re s
p
ec ti vel
y
.
W
he n t hes t r
i d
e s
a r
e
Th
e
p
r e
m
i s e i s
t ha t
s
p
e ct ro
g
r a
m
s
can b e
s e e n
a si
m
a
g
e s
l ar
g
e
r
than1
,
t he c onv ol uti o n o
p
er ati on al so
s ub s a
m
p
l
es
,
i
n
w
i
t
hs
p
e ci
a
l
p
a tter ns fro
m
w
hi ch ex
p
e rie nc ed
p
eo
p
l e c an t el l
a
dd i t i o
n a
lt o c
onv ol v
i
n
g,
t
h ei n
p
u
t
i
m
a
g
e and l ead s to a l o
w
er
-
w
h at ha sbe en
s a i
d .
I
n
d
ee
p
C
N
N
s
,
eac hh i
g
he rl a
y
er i sa
r es
o
l
u t
i on i
m
a
g
et h a ti s le s s s en si t iv e to t he s
m
al l
p
a tte rns
hi ft
w
e
i
g
h
te
d
s
u
m
o
f non l i ne art r ans
f
or
m
at i on o fa
w
i n d o
w
of
i n si de the i
n
p
u
t
p
at ch . The t ran s lat i on a l i
n
va
ri
a n ce ca n b ef ur
-
l
o
w
e
r
l
a
y
er
s and t h us c o
v
ers l o n
g
er c on t ex ts an d o
p
er at es on
t
he
r
i
m
p
r
ov
ed
w
hen s o
m
eki n d of a
gg
re
g
at i on o
p
era ti on s ar e
m
o re a bs tr a
c
t
p
a
t t e
r n s . Lo
w
er CNN l a
y
er s c a
p
t
ure l oc a l s i
m
p
l e
a
pp
l i ed after t he c onv ol uti ono
p
e rati on .T
y p
i
c a
l
a
g
g
re
g
a
t
i
o
n
p
a t
t
er ns
w
hil eh i
g
her C N
N
l a
y
er sdet ec t broa de r
,
abs t ra ct
,
o
p
er at i o n sa re
m
ax
-
p
o o l in
g
a n da ve ra
g
e
-
p
oo li n
g
.
The
a
gg
r
e
g
a
-
a
n
d
m
or e
co
m
p
l i ca ted
p
a tte rns . S
m
al l e rkern e l s
c o
m
bi n e d
t io n o
p
er at i
o
n s oft e n
g
o
w
i t hs u bs a
m
p
l i n
g
t o r educ e re so
l
u t
i
on
,
w
i
t h
m
orel a
y
e r sal l o
w
dee
p
CN
N
st oe x
p
l o it Io n
g
er
-
r an
g
e
Duet o t he bui lt
-
in tr a n s l at i o nal i nvari ab
i
l
i
t
y
CNN
sca
n ex
p
l oit
d e
p
e
n
de n c
y
i nfo r
m
ati on al on
g
b oth t i
m
e a n dfre
q
u en c
y
ax e s
vari abl e
-
l e n
g
t hc on t ex t ual i n fo r
m
a t i on a l o n
g
b ot hf re
q
uen c
y
m
o
re
ef f
ec
ti v e l
y
.E
m
p
i r i c al l
y
de e
p
CN Ns
are
c o
m
p
ati ble
and
t
i
m
e axe s . I t i s obv i o u s th a t i f o n l
y
on e con
v
ol uti on l a
y
er
to
BL S
T
M
s
[
1 9
] ,
w
hi chi nt u rn
o u
t
p
e rfor
m
u ni d i re c t io n al
i s u s e d
,
t he t ra n sl a ti o na lv ari ab i l i t
y
t hat t
h
es
y
st e
m
c a
n
t
ol
e r
a
t
e
LS T
M
s . Ho
w
ev er
,
unl i ke BL S T
M
s
w
hic hs u f fer fr o
m
l on
g
i s l i
m
it e d
.
To al l o
w
for
m
o re
p
o
w
er fu lex
p
l o i t at i o
n
of t
h
e
l a
t
enc
y,
de e
p
CNN s h av e l i
m
i t ed l at e n c
y
a nd are be tt er s u i te d
va ria bl e
-
len
g
t h cont e xt u a li nfor
m
at i on
,
c
o
nvol u t i on o
p
er at
i
on
s
f
or
real
-
t
i
m
es
y
st e
m
s if t he c o
m
p
ut ati on co s t ca n be co nt rol l ed ,
(
o rl a
y
er s
)
c a n bes t ac ke d.
T
r a
i
n
i
n
g
an
d
ev a
l ua ti o
n
o f
dee
p
CNNs i
s
v er
y
t i
m
e co n s u
m
-
Th e
t i
m
ed el a
y
n eu r al ne t
w
ork
(
TDN N
) [
43
]
w
a s
t
h
e
i
n
g ,
e s
p
.
i f
w
e
t
re
at ea
c
h
w
i
n
d
o
w
o f f
r
a
m
es i n
d
e
p
en den tl
y ,
f i r s
t
m
o del
t
ha te x
p
l o i t s
m
u l t i
p
l e C NNl a
y
e rs
f
or A
S
R
.
I n
un d er
w
hi
ch
con d
i
t
i o n
th
er e
ar e
s i
g
ni f i ca nt d u
p
l i ca
t
i on
o f
t h i s
m
ode l
,
c
o
n vo l u t io n o
p
er at i o ns are a
pp
l i e
d
to
bo
t
h
t
i
m
e
c o
m
p
ut a ti on s
.
T
os
p
e
e
d
u
p
th
ec o
m
p
u tat
i on
w
ec
ant r e a t
th e
a n dfr
e
q
ue n c
y
a xe s
.
Ho
w
ev e r
,
th e e ar l
y
T DNN sar
e
n eu r a
l
-
w
h o
l
eu t te ra nce a s
a s i
n
g
l e i
n
p
ut
i
m
a
g
e
a ndt hu sr eu s
e
t h e
ne
t
w
or k
-
on l
y
sol ut i o n s t h at do n ot i n t e
g
r at e
w
i th H
M
M
s an
d
i n t er
m
e di at e
c o
m
p
ut
a
t
i
on
res
u l ts
.
E ve n
be
t t
e
r,
i
f
t
h e
d
e e
p
are ha rdt
o
b eu s edi n l ar
g
e
v oc a bu l ar
y
con t
i
nuo u s
s
p
ee
ch
C
N
Ni
sde
si
g
ne
ds ot h
a t
t h e
s trid e
at ea chl a
y
er i sl on
g
rec o
g
ni t ion
(
LVCSR
)
.
e nou
g
h
t o
c
ov
e
r
t he
w
hol e ke rn el
,
s i
m
i l a r t o th e C
N
N s
w
ith
Af
t
e r t he s uc c e s s fu l a
pp
l i ca t ion of DNNs t o L
V
C SR
,
CN
N
s
la
y
e
r
-
w
i se
co n
t
ex
t
ex
p
a n s i o
n
a n d at te n
t
i on
(
LAC E
)
[
1 7
]
. S uc h
w
er er ei n tr
o
d uce dun der th e DL/
H
M
M
h
y
bri
d
m
ode
l ar c hit
e
c
-
m
o
del
,
ca
l l ed di l at ed CN N
[
46
] ,
al lo
w
s to e x
p
l o i t l on
g
er
-
ran
g
e
t ure
[
5
] , [
7
] , [
1 1
] , [
1 4
] , [
1 7
]
,
[
44
]
—
[
4
6
]
. Be c au se H
M
M
si n the
i
n
f
o
r
m
at i o
n
w
i th l e s s n u
m
be rof l a
y
ers and c an s i
g
ni fica n t l
y
h
y
b
r
i
d
m
o
d el
al re ad
y
h ave st ron
g
abi l i t
y
t o ha
n
dl evari abl e
-
redu c eth e co
m
p
u t at
i o
nal c
o s
t
.
Di l
a
t
e dC
N
N has ou
t
p
erf or
m
ed
l en
g
t hu t te r an ce
p
r o bl e
m
i nA SR
,
CN
N
s
w
er ere
i
n
t
r o
d
uc e
d
othe r
d
ee
p
C
N
N
m
ode l s
o
n the s
w
i t chb oar d ta sk
[
46
]
.
i nit i a l l
y
to dea l
w
i th va riab il i t
y
at t he
f re
q
ue n c
y
ax i s o
n
l
y [
5
] ,
No tet ha tde e
p
C
NN
s ca n be u se
d
t o
g
e t
h
er
w
i
th
RN N
s
a
n
d
[
7
]
,
[
44
] , [
4 5
]
.T he
g
oal
w
as toi
m
p
rove rob us t nes sa
g
a
i
n s t
un der fra
m
e
w
orks s
u
c h as c on n ect i o
n
i st t e
m
p
or al c l as s i fi c a t
io
n
v oc al t r act l en
g
th v ar ia bi l i t
y
be t
w
ee n
di ff
ere n
t
s
p
ea
ke rs
.
On l
y
(
CTC
)
tha t
w
e
w
i l l
di
s cu s s
i
n
S
e c t
i
on I I I
-
B
.
on e t o t
w
o
CN Nla
y
ers
w
er e
use di n the s
e
ear l
y
m
od e l s
,
s t ac ke d
w
i th addi t io n alf ul l
y
-
c o
n
n ec te d DN
N
l
a
y
er
s . Th
e
se
a
m
八
。
^
八
m
o de l s h a ve sh o
w
n a ro u n d 5
%
r el a t iv e re co
g
n
i
t
i
o
n
er ror r ate
O
P
TI
M
I
Z A
TI O N
r edu c t i o n
c o
m
p
a re dt o th e DN
N
/H
M
M
s
y
st e
m
s
[
7
]
.
L
at
e
r
,
addi
t
i onal RN Nl a
y
e rs
,
e
.
g
.
,
LST
M
s
,
w
ere int e
g
r ate
d i
n
t o th
e
The
m
odel s
di s cus s ed
in
t he
p
r evi ous
s ec t ion
are
m
od el tofor
m
s o
cal l ed CNN
-
LS T
M
-
D
N
N
(
CLDN
N
) [
1 0
]
DNN/ H
M
M
h
y
b ri d
m
od e l s in
w
hi c ht h e t
w
o
c o
m
p
o ne n t s
an dC NN
-
D
NN
-
L ST
M
(
CDL
)
a rc hi tect u re s
.
T
h
e
R
N
Ns i n
DNN a nd H
M
M
are u s u a l l
y
o
p
t i
m
i z eds e
p
ar a te l
y
.
Ho
w
ever
,
th es e
m
ode l s c an he l
p
t o ex
p
l o i t t he v ari abl e
-
l
en
g
t
h
c
o
n
t
e xt u
al
s
p
e ech rec o
g
ni t io n i sa s e
q
ue n t i a l rec o
g
nit i on
p
robl e
m
.It i s
剩余13页未读,继续阅读
zyk1060513882
- 粉丝: 0
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功