RNN单元对比：GRU与LSTM在序列建模中的表现

需积分: 0 120 浏览量更新于2024-08-05 收藏 640KB PDF 举报

"这篇论文对比了不同类型的循环神经网络（RNN）中的循环单元，重点关注了具有门控机制的更复杂单元，如长短时记忆（LSTM）单元和新提出的门控循环单元（GRU）。在多音音乐建模和语音信号建模任务上进行了评估，结果显示这些先进的循环单元比传统的如tanh单元表现更好。此外，研究发现GRU与LSTM的表现相当。" 在机器学习领域，循环神经网络（RNN）因其在处理变长输入和输出的能力而近年来表现出极大的潜力，特别是在诸如Graves（2012）所报道的各种任务中。RNNs通过维护内部状态来捕捉序列数据中的时间依赖性，使得它们非常适合于诸如自然语言处理、语音识别和时间序列预测等任务。在这篇2014年的论文中，作者Junyoung Chung、Caglar Gulcehre、KyungHyun Cho和Yoshua Bengio（来自蒙特利尔大学和CIFAR高级研究员）深入研究了RNNs的不同循环单元设计。他们特别关注了那些包含门控机制的单元，因为这些机制能够有效地解决长期依赖问题，防止梯度消失或爆炸。 1. 长短时记忆网络（LSTM） LSTM由Hochreiter和Schmidhuber（1997）提出，它通过引入输入门、遗忘门和输出门来控制信息流，使得网络可以选择性地记住或忘记过去的上下文信息。LSTM单元在处理长时间依赖关系时表现出了卓越的能力，这使得它们在许多序列建模任务中成为首选。 2. 门控循环单元（GRU）门控循环单元（GRU）是Kyunghyun Cho等人在2014年提出的，作为LSTM的一个简化版本，它融合了输入门和遗忘门的概念，减少了计算成本，同时保持了捕获长期依赖的能力。GRU通过重置门和更新门来控制信息的流动，简化后的结构使其在某些情况下与LSTM有相似的性能，但训练速度更快。 3. 实验结果通过在多音音乐建模和语音信号建模任务上的实验，作者发现GRU和LSTM都显著优于传统的tanh激活函数的RNN单元。这表明，具有门控机制的循环单元在捕捉序列模式方面具有优势。GRU的表现与LSTM相当，这意味着在某些应用中，GRU可能是一个更有效的选择，因为它通常需要更少的计算资源。 4. 应用前景这些发现对RNN的应用有着重要的启示。在资源有限的情况下，GRU可以作为LSTM的可行替代品，特别是在实时应用或资源敏感的设备上。同时，对于需要处理序列数据的其他领域，例如自然语言生成、视频分析或金融市场预测，采用门控循环单元的RNN模型可能会提高模型的性能和效率。这篇论文提供了一种比较不同RNN循环单元的有效方法，并揭示了门控机制在处理序列数据时的重要性。随着计算能力的提升和模型优化技术的发展，我们可以预期RNN，尤其是LSTM和GRU，将在未来的机器学习任务中发挥更大的作用。

Empirical Evaluation of

Gated Recurrent Neural Networks

on Sequence Modeling

Junyoung Chung Caglar Gulcehre KyungHyun Cho

Universit

e de Montr

eal

Yoshua Bengio

Universit

e de Montr

eal

CIFAR Senior Fellow

Abstract

In this paper we compare different types of recurrent units in recurrent neural net-

works (RNNs). Especially, we focus on more sophisticated units that implement

a gating mechanism, such as a long short-term memory (LSTM) unit and a re-

cently proposed gated recurrent unit (GRU). We evaluate these recurrent units on

the tasks of polyphonic music modeling and speech signal modeling. Our exper-

iments revealed that these advanced recurrent units are indeed better than more

traditional recurrent units such as tanh units. Also, we found GRU to be compa-

rable to LSTM.

1 Introduction

Recurrent neural networks have recently shown promising results in many machine learning tasks,

especially when input and/or output are of variable length [see, e.g., Graves, 2012]. More recently,

Sutskever et al. [2014] and Bahdanau et al. [2014] reported that recurrent neural networks are able to

perform as well as the existing, well-developed systems on a challenging task of machine translation.

One interesting observation, we make from these recent successes is that almost none of these suc-

cesses were achieved with a vanilla recurrent neural network. Rather, it was a recurrent neural net-

work with sophisticated recurrent hidden units, such as long short-term memory units [Hochreiter

and Schmidhuber, 1997], that was used in those successful applications.

Among those sophisticated recurrent units, in this paper, we are interested in evaluating two closely

related variants. One is a long short-term memory (LSTM) unit, and the other is a gated recurrent

unit (GRU) proposed more recently by Cho et al. [2014]. It is well established in the ﬁeld that the

LSTM unit works well on sequence-based tasks with long-term dependencies, but the latter has only

recently been introduced and used in the context of machine translation.

In this paper, we evaluate these two units and a more traditional tanh unit on the task of sequence

modeling. We consider three polyphonic music datasets [see, e.g., Boulanger-Lewandowski et al.,

2012] as well as two internal datasets provided by Ubisoft in which each sample is a raw speech

representation.

Based on our experiments, we concluded that by using ﬁxed number of parameters for all models

on some datasets GRU, can outperform LSTM units both in terms of convergence in CPU time and

in terms of parameter updates and generalization.

2 Background: Recurrent Neural Network

A recurrent neural network (RNN) is an extension of a conventional feedforward neural network,

which is able to handle a variable-length sequence input. The RNN handles the variable-length

arXiv:1412.3555v1 [cs.NE] 11 Dec 2014

下载后可阅读完整内容，剩余8页未读，立即下载

萱呀

粉丝: 31
资源: 354

RNN单元对比：GRU与LSTM在序列建模中的表现

char-rnn-cn, 基于char-rnn和tensorflow生成周杰伦歌词.zip

text-classification-cnn-rnn-master_RNN_CNN_

a-important-rnn-2015-RNN应用-KarparthyUNREASONABLY-EFFECTIVE-RNN-1

用卷积滤波器matlab代码-CRF-RNN-REPL:CRF-RNN-REPL

DSTP-RNN_forecasttimeseries_dstp-rnn_attention_

tf-hierarchical-rnn

Sherlock-text-generator-RNN

pmt-rnn

Bidirectional-stacked-RNN-with-LSTM-GRU

matlab中的GM模型代码-RNN-HA:RNN-HA

最新资源