递归神经网络训练教程

需积分: 10 111 浏览量更新于2024-07-17 收藏 3.19MB PDF 举报

"这篇教程详细介绍了如何训练循环神经网络，涵盖了反向传播通过时间（BPTT）、实时递归学习（RTRL）以及扩展卡尔曼滤波（EKF）方法，并特别提到了回声状态网络（ESN）的训练方法。作者Herbert Jaeger在2002年首次在弗劳恩霍夫自主智能系统研究所（AIS）进行了为期5小时的课程，并在此后多次修订。" 在深度学习领域，循环神经网络（Recurrent Neural Networks, RNNs）是一种特殊类型的神经网络，它们能够处理序列数据，如自然语言、时间序列数据或音频信号，因为它们具有记忆过去信息的能力。本教程深入探讨了训练RNN的关键技术。首先，反向传播通过时间（Backpropagation Through Time, BPTT）是训练RNN最常用的方法之一。BPTT的工作原理是对每个时间步长应用标准的反向传播算法，将误差从输出层向输入层反向传播，以更新网络权重。这种方法有效地处理了时间依赖性，但可能会遇到梯度消失或梯度爆炸的问题。其次，实时递归学习（Real-Time Recurrent Learning, RTRL）是一种在线学习算法，它在每个时间步长更新权重，无需完整遍历整个序列。RTRL对每个权重的更新使用了完整的梯度信息，因此计算成本较高，但在某些情况下，它可以提供更快的收敛速度。再者，扩展卡尔曼滤波（Extended Kalman Filter, EKF）是将经典控制理论与神经网络结合的一种方法。EKF是一种非线性滤波器，可以用来估计RNN的隐藏状态，从而进行参数优化。EKF在处理非线性动态系统时较为有效，但其计算复杂度也相对较高。教程的后半部分专注于回声状态网络（Echo State Network, ESN）。ESN是一种稀疏连接、随机初始化的RNN变体，其关键在于“回声状态属性”，即网络的内部状态能够保留并反映输入序列的历史信息。ESN的训练通常比传统的RNN简单，因为它只需要训练输出层，而输入层和隐藏层的权重保持不变。这使得ESN在许多任务中表现良好，尤其是在序列预测问题上。教程通过简单的例子和详细的解释，旨在使初学者能够理解和应用这些方法。尽管文档的布局可能不尽人意，但其内容对于理解RNN的训练机制和实践应用非常有价值，特别是对于那些对自然语言处理、计算机视觉和机器学习感兴趣的读者。通过学习这个教程，读者将能够掌握RNN训练的核心概念，并具备实施这些技术的基础能力。

where (u(n+1),x(n+1)) denotes the concatenated vector made from input and internal

activation vectors. We will use output transfer functions f

out

= tanh or f

out

= 1; in the

latter case we have linear output units.

1.4 Example: a little timer network

Consider the input-output task of timing. The input signal has two components. The

first component u

(n) is 0 most of the time, but sometimes jumps to 1. The second

input u

(n) can take values between 0.1 and 1.0 in increments of 0.1, and assumes a

new (random) of these values each time u

(n) jumps to 1. The desired output is 0.5

for 10 x u

(n) time steps after u

(n) was 1, else is 0. This amounts to implementing a

timer: u

(n) gives the "go" signal for the timer, u

(n) gives the desired duration.

Figure 1.6: Schema of the timer network.

The following figure shows traces of input and output generated by a RNN trained on

this task according to the ESN approach:

Figure 1.7: Performance of a RNN trained on the timer task. Solid line in last graph:

desired (teacher) output. Dotted line: network ouput.

Clearly this task requires that the RNN must act as a memory: it has to retain

information about the "go" signal for several time steps. This is possible because the

internal recurrent connections let the "go" signal "reverberate" in the internal units'

activations. Generally, tasks requiring some form of memory are candidates for RNN

modeling.

2. Standard training techniques for RNNs

During the last decade, several methods for supervised training of RNNs have been

explored. In this tutorial we present the currently most important ones:

backpropagation through time (BPTT), real-time recurrent learning (RTRL), and

extended Kalman filtering based techniques (EKF). BPTT is probably the most widely

used, RTRL is the mathematically most straightforward, and EKF is (arguably) the

technique that gives best results.

2.1 Backpropagation revisited

BPTT is an adaptation of the well-known backpropagation training method known

from feedforward networks. The backpropagation algorithm is the most commonly

used training method for feedforward networks. We start with a recap of this method.

We consider a multi-layer perceptron (MLP) with k hidden layers. Together with the

layer of input units and the layer of output units this gives k+2 layers of units

altogether (Fig. 1.1. left shows a MLP with two hidden layers), which we number by 0,

..., k+1. The number of input units is K, of output units L, and of units in hidden layer m

is N

. The weight of the j-th unit in layer m and the i-th unit in layer m+1 is denoted by

. The activation of the i-th unit in layer m is x

(for m = 0 this is an input value, for

m = k+1 an output value).

The training data for a feedforward network training task consist of T input-output

(vector-valued) data pairs

(2.1) ,

where n denotes training instance, not time. The activation of non-input units is

computed according to

(2.2)

(Standardly one also has bias terms, which we omit here). Presented with teacher

input u(t), the previous update equation is used to compute activations of units in

subsequent hidden layers, until a network response

(2.3)

is obtained in the output layer. The objective of training is to find a set of network

weights such that the summed squared error

剩余46页未读，继续阅读

解惑者小双子

粉丝: 0
资源: 98

递归神经网络训练教程

Recurrent Neural Networks Tutorial (RNN)

Artificial Neural Networks - A Tutorial

A Tutorial on Learning with Bayesian Networks.pdf（好书）

IEEE Artificial Neural Networks A Tutorial

a tutorial on learning with bayesian networks

Efficient Processing of Deep Neural Networks A Tutorial and Survey

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

深度学习国外综述论文 Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Tensorflow 18.2 CNN 卷积神经网络 Convolutional Neural Networks (神经网络 教学教程tutorial)

Tensorflow 18.3 CNN 卷积神经网络 Convolutional Neural Networks (神经网络 教学教程tutorial)

最新资源

Tensorflow 18.2 CNN 卷积神经网络 Convolutional Neural Networks (神经网络教学教程tutorial)

Tensorflow 18.3 CNN 卷积神经网络 Convolutional Neural Networks (神经网络教学教程tutorial)