RNN与LSTM的正则化技术：Dropout的应用与效果

需积分: 0 161 浏览量更新于2024-08-05 收藏 115KB PDF 举报

"本文探讨了如何在循环神经网络（RNN），特别是长短期记忆网络（LSTM）中使用dropout进行有效的正则化，以减少过拟合，并在多种任务上取得显著的效果提升。" 循环神经网络（RNN）是神经网络序列模型的一种，尤其在语言建模、语音识别和机器翻译等任务上表现出最先进的性能。然而，成功应用神经网络的一个关键因素是良好的正则化策略。传统的dropout技术，尽管在常规神经网络中非常有效，但在RNN和LSTM中的表现并不理想。 dropout是一种常用的正则化方法，通过在训练过程中随机丢弃一部分神经元来防止模型过度依赖某些特征，从而减少过拟合。但在RNN中，由于时间步之间的权重共享，简单的dropout可能导致信息流的断裂，影响模型的学习效果。文中作者提出了在LSTM中正确应用dropout的策略。他们建议对隐藏状态的每个时间步应用独立的dropout，而不是在整个LSTM层上应用全局dropout。这样可以保持时间序列的连贯性，同时仍然能够减少单元之间的依赖，实现正则化目的。这种方法被称为“单位内dropout”或“时间步dropout”。作者展示了这种改进的dropout技术在多项任务上的有效性，包括语言建模、语音识别、图像标题生成和机器翻译。实验结果表明，使用该方法后，模型的泛化能力显著提高，过拟合现象得到大幅减轻。此外，论文还讨论了在不同任务和数据集上调整dropout率的重要性，因为最佳的dropout比例可能因任务而异。通过在验证集上进行超参数调优，可以找到最能提升模型性能的dropout策略。这篇2015年的研究揭示了如何将dropout成功地应用于RNN，特别是LSTM，这对于提高模型的稳定性和泛化能力具有重要意义。这一发现对于后来的深度学习研究和实践中LSTM的应用产生了深远的影响，成为了处理序列数据时正则化的重要手段之一。

arXiv:1409.2329v5 [cs.NE] 19 Feb 2015

Under review as a conference paper at ICLR 2015

RECURRENT NEURAL NETWORK REGULARIZATION

Wojciech Zaremba

∗

New York University

woj.zaremba@gmail.com

Ilya Sutskever, Oriol Vinyals

Google Brain

{ilyasu,vinyals}@google.com

ABSTRACT

We present a simple regularization technique for Recurrent Neural Networks

(RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most suc-

cessful technique for regularizing neural networks, does not work well with RNNs

and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs,

and show that it substantially reduces overﬁtting on a variety of tasks. These tasks

include language modeling, speech recognition, image caption generation, and

machine translation.

∗

1 INTRODUCTION

The Recurrent Neural Network (RNN) is neural sequence model that achieves state of the art per-

formance on important tasks that include language modeling Mikolov (2012), speech recognition

Graves et al. (2013), and machine translation Kalchbrenner & Blunsom (2013). It is known that

successful applications of neural networks require good regularization. Unfortunately, dropout

Srivastava (2013), the most powerful regularization method for feedforward neural networks, does

not work well with RNNs. As a result, practical applications of RNNs often use models that are

too small because large RNNs tend to overﬁt. Existing regularization methods give relatively small

improvements for RNNs Graves (2013). In this work, we show that dropout, when correctly used,

greatly reduces overﬁtting in LSTMs, and evaluate it on three different problems.

The code for this work can be found in https://github.com/wojzaremba/lstm.

2 RELATED WORK

Dropout Srivastava (2013) is a recently introduced regularization method that has been very suc-

cessful with feed-forward neural networks. While much work has extended dropout in various ways

Wang & Manning (2013); Wan et al. (2013), there has been relatively little research in applying it

to RNNs. The only paper on this topic is by Bayer et al. (2013), who focuses on “marginalized

dropout” Wang & Manning (2013), a noiseless deterministic approximation to standard dropout.

Bayer et al. (2013) claim that conventional dropout does not work well with RNNs because the re-

currence ampliﬁes noise, which in turn hurts learning. In this work, we show that this problem can

be ﬁxed by applying dropout to a certain subset of the RNNs’ connections. As a result, RNNs can

now also beneﬁt from dropout.

Independently of our work, Pham et al. (2013) developed the very same RNN regularization method

and applied it to handwriting recognition. We rediscovered this method and demonstrated strong

empirical results over a wide range of problems. Other work that applied dropout to LSTMs is

Pachitariu & Sahani (2013).

∗

Work done while the author was in Google Brain.

下载后可阅读完整内容，剩余7页未读，立即下载

小小二-yan

粉丝: 33
资源: 299

RNN与LSTM的正则化技术：Dropout的应用与效果

吴恩达RNN作业:Building your Recurrent Neural Network - Step by Step

【论文笔记】A Theoretically Grounded Application of Dropout in Recurrent Neural Networks（2016）

neural network.rar_Network_Neural networks_neural_neural network

NeuralNetwork Source Code

speech recognition with recurrent network

深入理解TensorFlow循环神经网络(Recurrent Neural Network)

RNN中的正则化方法：Dropout、权重衰减等

循环神经网络(RNN)中的正则化与优化

【防止过拟合】：RNN中的正则化技术与策略

最新资源