没有合适的资源?快使用搜索试试~ 我知道了~
首页Supervised Sequence Labelling with Recurrent Neural Networks
Supervised Sequence Labelling with Recurrent Neural Networks
需积分: 9 91 浏览量
更新于2023-05-29
评论 1
收藏 7.04MB PDF 举报
a good source for learning recurrent neural network
资源详情
资源评论
资源推荐

Technische Universit¨at M¨unchen
Fakult¨at f¨ur Informatik
Lehrstuhl VI: Echtzeitsysteme und Robotik
Supervised Sequence Labelling
with Recurrent Neural Networks
Alex Graves
Vollst¨andiger Abdruck der von der Fakult¨at f¨ur Informatik der Technischen
Universit¨at M¨unchen zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
genehmigten Dissertation.
Vorsitzender: Univ.-Prof. B. Br¨ugge, Ph.D
Pr¨ufer der Dissertation: 1. Univ.-Prof. Dr. H. J. Schmidhuber
2. Univ.-Prof. Dr. St. Kramer
Die Dissertation wurde am 14.01.2008 bei der Technischen Universit¨at M¨unchen
eingereicht und durch die Fakult¨at f¨ur Informatik am 19.06.2008 angenom-
men.

Abstract
Recurrent neural networks are powerful sequence learners. They are able
to incorporate context information in a flexible way, and are robust to lo-
calised distortions of the input data. These properties make them well suited
to sequence labelling, where input sequences are transcribed with streams of
labels. Long short-term memory is an especially promising recurrent archi-
tecture, able to bridge long time delays between relevant input and output
events, and thereby access long range context. The aim of this thesis is to
advance the state-of-the-art in supervised sequence labelling with recurrent
networks in general, and long short-term memory in particular. Its two
main contributions are (1) a new type of output layer that allows recurrent
networks to be trained directly for sequence labelling tasks where the align-
ment between the inputs and the labels is unknown, and (2) an extension
of long short-term memory to multidimensional data, such as images and
video sequences. Experimental results are presented on speech recognition,
online and offline handwriting recognition, keyword spotting, image segmen-
tation and image classification, demonstrating the advantages of advanced
recurrent networks over other sequential algorithms, such as hidden Markov
Models.
ii

Acknowledgements
I would like to thank my supervisor J¨urgen Schmidhuber for his guidance
and support. I would also like to thank my co-authors Santi, Tino, Nicole
and Doug, and everyone else at IDSIA for making it a stimulating and
creative place to work. Thanks to Tom Schaul for proofreading the thesis,
and Marcus Hutter for his mathematical assistance during the connectionist
temporal classification chapter. I am grateful to Marcus Liwicki and Horst
Bunke for their expert collaboration on handwriting recognition. A special
mention goes to Fred and Matteo and all the other Idsiani who helped me
find the good times in Lugano. Most of all, I would like to thank my family
and my wife Alison for their constant encouragement, love and support.
This research was supported in part by the Swiss National Foundation,
under grants 200020-100249, 200020-107534/1 and 200021-111968/1.
iii

Contents
Abstract ii
Acknowledgements iii
Contents iv
List of Tables vii
List of Figures viii
List of Algorithms x
1 Introduction 1
1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Supervised Sequence Labelling 4
2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Pattern Classification . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Probabilistic Classification . . . . . . . . . . . . . . . . 6
2.2.2 Training Probabilistic Classifiers . . . . . . . . . . . . 6
2.2.3 Generative and Discriminative Models . . . . . . . . . 7
2.3 Sequence Labelling . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 A Taxonomy of Sequence Labelling Tasks . . . . . . . 9
2.3.2 Sequence Classification . . . . . . . . . . . . . . . . . . 9
2.3.3 Segment Classification . . . . . . . . . . . . . . . . . . 11
2.3.4 Temporal Classification . . . . . . . . . . . . . . . . . 12
3 Neural Networks 13
3.1 Multilayer Perceptrons . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Output Layers . . . . . . . . . . . . . . . . . . . . . . 16
3.1.3 Objective Functions . . . . . . . . . . . . . . . . . . . 17
3.1.4 Backward Pass . . . . . . . . . . . . . . . . . . . . . . 18
iv

CONTENTS v
3.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . 20
3.2.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Bidirectional RNNs . . . . . . . . . . . . . . . . . . . 22
3.2.4 Sequential Jacobian . . . . . . . . . . . . . . . . . . . 25
3.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Gradient Descent Algorithms . . . . . . . . . . . . . . 27
3.3.2 Generalisation . . . . . . . . . . . . . . . . . . . . . . 28
3.3.3 Input Representation . . . . . . . . . . . . . . . . . . . 30
3.3.4 Weight Initialisation . . . . . . . . . . . . . . . . . . . 31
4 Long Short-Term Memory 32
4.1 The LSTM Architecture . . . . . . . . . . . . . . . . . . . . . 33
4.2 Influence of Preprocessing . . . . . . . . . . . . . . . . . . . . 35
4.3 Gradient Calculation . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Architectural Enhancements . . . . . . . . . . . . . . . . . . . 36
4.5 LSTM Equations . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5.1 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.2 Backward Pass . . . . . . . . . . . . . . . . . . . . . . 39
5 Framewise Phoneme Classification 40
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Network Architectures . . . . . . . . . . . . . . . . . . . . . . 41
5.2.1 Computational Complexity . . . . . . . . . . . . . . . 42
5.2.2 Range of Context . . . . . . . . . . . . . . . . . . . . . 42
5.2.3 Output Layers . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.1 Retraining . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4.1 Comparison with Previous Work . . . . . . . . . . . . 46
5.4.2 Effect of Increased Context . . . . . . . . . . . . . . . 47
5.4.3 Weighted Error . . . . . . . . . . . . . . . . . . . . . . 48
6 Hidden Markov Model Hybrids 50
6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Experiment: Phoneme Recognition . . . . . . . . . . . . . . . 52
6.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . 52
6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 Connectionist Temporal Classification 54
7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.2 From Outputs to Labellings . . . . . . . . . . . . . . . . . . . 55
7.2.1 Role of the Blank Labels . . . . . . . . . . . . . . . . . 57
7.3 CTC Forward-Backward Algorithm . . . . . . . . . . . . . . . 57
剩余123页未读,继续阅读
















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0