深度解析：回声状态网络与传统RNN训练方法教程

需积分: 9 124 浏览量更新于2024-07-18 收藏 1.35MB PDF 举报

本教程是Herbert Jaeger在2002年9月至10月在AIS（德国弗劳恩霍夫自动化与智能系统研究所）举办的为期5小时课程的详尽讲解。教程主要关注训练循环神经网络（Recurrent Neural Networks, RNNs）的几种传统方法，包括反向传播通过时间（Backpropagation Through Time, BPTT）、实时循环学习（Real-Time Recurrent Learning, RTRL）以及扩展卡尔曼滤波（Extended Kalman Filtering, EKF）技术。这部分内容主要分布在教程的第二至第五部分，深入且数学导向。教程的前半部分，即第一部分和第六至第九部分，采取更为温和、详细的教学方式，通过简单的示例进行阐述。这些章节旨在单独作为基于“回声状态网络”（Echo State Network, ESN）方法来训练RNN的入门教程。作者提醒读者，由于文档是从HTML文件转换为Word文件，格式可能不尽完美。回声状态网络是一种特殊的RNN结构，它具有随机固定权值矩阵，使得网络在训练初期就进入一个无记忆的状态，而后期仅依赖于输入信号进行学习。这种设计允许ESN处理高维、非线性和非平稳的数据，对于序列预测和时间序列分析等任务表现出色。 BPTT是训练RNN的标准方法，通过计算梯度信息更新权重，以最小化损失函数。然而，由于RNN的长期依赖问题，BPTT在长序列时可能遇到梯度消失或爆炸的问题。RTRL则试图解决这个问题，它实时地调整权重，但计算复杂度较高。EKF作为一种来自控制理论的方法，通过结合状态估计和观测数据，为在线学习RNN提供了另一种策略。总体来说，这个教程为理解RNN的训练机制提供了全面且实用的视角，无论是对初学者还是经验丰富的专业人士，都能从中收获关于不同训练策略的深入理解和实践经验。同时，它还强调了回声状态网络作为一种有效的RNN训练框架，适用于各种实际应用中的序列建模任务。

(1.7) )),(),1(),1((()1( nnnn

outout

yxuWfy ++=+

where

(u(n+1),x(n+1),y(n)) denotes the concatenated vector made from input, internal,

and output activation vectors. We will use output transfer functions

out

= tanh or f

out

; in the latter case we have linear output units.

1.4 Example: a little timer network

Consider the input-output task of timing. The input signal has two components. The

first component

(n) is 0 most of the time, but sometimes jumps to 1. The second

input u

(n) can take values between 0.1 and 1.0 in increments of 0.1, and assumes a

new (random) of these values each time

(n) jumps to 1. The desired output is 0.5

for 10 x u

(n) time steps after u

(n) was 1, else is 0. This amounts to implementing a

timer:

(n) gives the "go" signal for the timer, u

(n) gives the desired duration.

......

input 1: start signals

input 2: duration setting

ouput: rectangular signals of

desired duration

......

input 1: start signals

input 2: duration setting

ouput: rectangular signals of

desired duration

Figure 1.6: Schema of the timer network.

The following figure shows traces of input and output generated by a RNN trained on

this task according to the ESN approach:

Figure 1.7: Performance of a RNN trained on the timer task. Solid line in last graph:

desired (teacher) output. Dotted line: network ouput.

Clearly this task requires that the RNN must act as a memory: it has to retain

information about the "go" signal for several time steps. This is possible because the

internal recurrent connections let the "go" signal "reverberate" in the internal units'

activations. Generally, tasks requiring some form of memory are candidates for RNN

modeling.

2. Standard training techniques for RNNs

During the last decade, several methods for supervised training of RNNs have been

explored. In this tutorial we present the currently most important ones:

backpropagation through time (BPTT), real-time recurrent learning (RTRL), and

extended Kalman filtering based techniques (EKF). BPTT is probably the most widely

used, RTRL is the mathematically most straightforward, and EKF is (arguably) the

technique that gives best results.

2.1 Backpropagation revisited

BPTT is an adaptation of the well-known backpropagation training method known

from feedforward networks. The backpropagation algorithm is the most commonly

used training method for feedforward networks. We start with a recap of this method.

We consider a multi-layer perceptron (MLP) with

k hidden layers. Together with the

layer of input units and the layer of output units this gives k+2 layers of units

altogether (Fig. 1.1. left shows a MLP with two hidden layers), which we number by

..., k+1

. The number of input units is K, of output units L, and of units in hidden layer m

is N

. The weight of the j-th unit in layer m and the i-th unit in layer m+1 is denoted by

. The activation of the i-th unit in layer m is x

(for m = 0 this is an input value, for

m = k+1 an output value).

The training data for a feedforward network training task consist of T input-output

(vector-valued) data pairs

(2.1)

ndndnnxnxn ))(,),(()(,))(,),(()(

== KK du

where

n denotes training instance, not time. The activation of non-input units is

computed according to

(2.2)

)).(()(

,...,1

nxwfnx

(Standardly one also has bias terms, which we omit here). Presented with teacher

input

u(t), the previous update equation is used to compute activations of units in

subsequent hidden layers, until a network response

(2.3)

))'(,),(()(

nxnxn

k ++

= Ky

剩余45页未读，继续阅读

ignite678@126.com

粉丝: 2
资源: 42

深度解析：回声状态网络与传统RNN训练方法教程

Recurrent Neural Networks Tutorial (RNN)

Artificial Neural Networks - A Tutorial

IEEE Artificial Neural Networks A Tutorial

a tutorial on learning with bayesian networks

全连接神经网络的相关参考文献

[4]Shlens J. A Tutorial on Principal Component Analysis[J]. arXiv preprint arXiv:1404.1100, 2014.的标准文献参考名

How To Simulate It – A Tutorial on the Simulation Proof Technique

我怎样了解PNN的工作原理呢，您能给我一些入门的参考资料吗

基于Matlab的遥感数据分析的相关文献

把这个网页翻译成中文：https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

最新资源