深度学习中的灾难性遗忘：测量与对策

需积分: 50 25 浏览量更新于2024-09-07 收藏 702KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇论文《在神经网络中测量灾难性遗忘》(Measuring Catastrophic Forgetting in Neural Networks)由Ronald Kemker、Marc McClure、Angelina Abitino、Tyler Hayes和Christopher Kanan共同撰写，发表于arXiv:1708.02072v3[cs.AI]，2017年9月11日。研究团队来自罗彻斯特理工学院的Chester F. Carlson成像科学中心。文章主要探讨了深度神经网络在面临新任务学习时出现的灾难性遗忘问题，并提出了相关的缓解策略。" 深度神经网络在当前的机器感知系统中扮演着核心角色，能够处理诸如图像分类等复杂任务。然而，一旦一个网络被训练完成特定任务，例如鸟类分类，要再让它学习新的任务，如识别更多鸟类或完全不同的任务，如花卉识别，就会面临挑战。当添加新任务时，传统的深度神经网络往往会出现“灾难性遗忘”，即在学习新任务的过程中快速丧失对先前任务的记忆。这个问题对于实现类人式的持续学习能力是一个重大障碍，因为人类可以随着时间逐渐形成新记忆而不会抹去旧记忆。因此，研究者们一直在探索如何设计出能逐步吸收新信息而不遗忘旧知识的神经网络模型。论文中提到，这样的模型在面对新任务时会比从头开始重新训练更有效率。为了应对灾难性遗忘，已有多项尝试提出了解决方案。这些方案通常包括权重保护、经验回放、动态网络结构调整等方法。权重保护是指在学习新任务时，对关键权重进行固定或限制更新，以保留原有任务的知识。经验回放则是将过去任务的样本在训练新任务时再次引入，以防止网络过度适应新数据而遗忘旧信息。动态网络结构调整则涉及在需要时增加或修改网络结构，以适应不断变化的学习需求。这篇论文对灾难性遗忘现象进行了深入分析，并提供了评估和度量这一问题的方法，为后续研究者提供了理论基础和实验工具，以期进一步推动深度学习模型在增量学习和持续学习领域的进展。通过理解和解决灾难性遗忘，我们可以期望构建出更智能、更适应实际需求的神经网络系统。

资源详情

资源推荐

that co ntribute the most to previous tasks. We use EWC to

evaluate the regularization mec hanism.

Ensemble Methods

Ensemble methods attempt to mitigate catastrophic forget-

ting either b y explicitly or implicitly training multiple classi-

ﬁers togeth er and then combinin g them to genera te the ﬁnal

prediction. For the explicit methods, such as L earn++ and

TradaBoost, this prevents fo rgetting because an en tirely new

sub-network is trained for a new batch (Polika r et al. 2001;

Dai et al. 2007). However, memory usage will scale with

the number of batches, which is highly non-desirable. More-

over, this prevents portions of the network from being re-

used for the new batch. Two methods that try to alleviate

the memory usage problem are Accuracy Weighted Ensem-

bles and Life-lo ng Machine L earning (Wang et al. 2003; Ren

et al. 2017). These meth ods automatically decide whether a

sub-network should be removed or added to the ensemble.

PathNet can be considered as an implicit ensemble

method (Fernan do et al. 201 7). It uses a genetic algor ithm

to ﬁnd an optimal path through a ﬁxed-size neu ral network

for each batch that it learns. The weights in this path are then

frozen; so that when new batches are learned, the knowledge

is not lost. In contrast to the explicit ensembles, th e base

network’s size is ﬁxed and it is possible for learned repre-

sentations to be re-used which allows for smaller, more de-

ployable models. The authors showed that PathNet learned

subsequen t tasks more quickly, but not how well e arlier tasks

were retained. We have selected PathNet to evaluate the en-

sembling mechanism, and we show how well it re ta ins pre-

trained inform ation.

Rehearsal Methods

Rehearsal methods try to mitigate catastrop hic forgetting by

mixing data from earlier batches w ith the current batch to be

learned (Robins 1995 ). The cost is that this requir e s storing

past da ta , which is not resource efﬁcient. Pseudorehearsal

methods use the network to gener ate pseudopatterns (Robins

1995) that are combined with the new batch to be learned.

Pseudopatter ns allow th e network to stabilize older memo-

ries without the requirement for storing a ll previously o b-

served training data points. Draelos et al. (2016) used this

approa c h to incrementally train an autoencoder, where each

batch contained images from a speciﬁc category. After the

autoencoder learned a particular batch, they passed the batch

through the encoder and stored the output statistics. During

replay, they used these statistics and the decoder network to

generate the appr opriate pseudopatterns for each class.

The SOM model proposed by Ge pperth and Karaoguz

(2016) reserves its training data to replay after each new

class was trained. This model used a self-organizing map

(SOM) as a hidden-layer to topologically reorganize the data

from the input layer (i.e., c lustering the inpu t onto a 2-D lat-

tice). We use this mod e l to explore the value of r ehearsal.

Dual-Memory Models

Dual-memory models are inspired by memory consolidation

in the mam malian brain, which is thought to store memo-

ries in two distinct neural network s. N ewly formed memo-

ries are stored in a br ain region known as the hippocampus.

These memories are then slowly transferred/consolidated to

the pre-fr ontal cortex during sleep. Several a lgorithms based

on these ideas have been created. Early work used fast (hip-

pocamp al) and slow (cortical) training networks to sepa-

rate patter n-processing areas, and they passed pseudopat-

terns back and forth to consolidate recent and remote mem-

ories (French 1997). In general, dual-memory models incor-

porate rehearsal, but not all rehearsal-based models are dual-

memory models.

Another model proposed by Gepperth and Karaoguz

(2016), which we denote STM, stor es new inputs that yield a

highly uncertain prediction into a short-term memory buffer.

This model then seeks to co nsolidate the new mem ories

into the entire network during a separate sleep phase. They

showed that STM could incrementally learn MNIST classes

without forgetting previously trained one s. We use SOM and

STM to evalu a te the dual-memory approach.

Sparse-Coding Methods

Catastrophic forgetting occurs when new internal represen-

tations interfere with previously learned ones ( French 1999).

Sparse representations ca n reduce the chanc e of this interfer-

ence; however, sparsity can impair generalization and ability

to learn new tasks (Sharkey and Sharkey 1995).

Two mod els that implicitly use sparsity ar e CALM and

ALCOVE. To learn new data, CALM searches among com-

peting nodes to see which nodes have not been commit-

ted to another representation (Murre 2014). ALCOVE is a

shallow neural network that uses a spar se distance-based

representation, which allows the weights assigned to older

tasks to be largely unc hanged when the network is pre-

sented with new data (Kruschke 1992). The Sparse Dis-

tributed Memory (SDM) is a convolution-correlation model

that uses sparsity to reduce the overlap between internal rep-

resentations (Kanerva 1988) . CHARM and TODAM are also

convolution-cor relation mode ls th a t use internal codings to

ensure that new input re presentations re main orthogonal to

one another (Murdock 1983; Eich 1982).

The Fixed Expansion Layer (FEL) model creates sparse

representations by ﬁxing the network ’s weights and speci-

fying neuron triggering conditions (Coop, Mishtal, and Arel

2013). FEL uses excitatory and inhibitory ﬁxed weights to

sparsify the input, which gates the weight updates through-

out the network. This enables the network to retain prior

learned mappings and reduce representational overlap. We

use FEL to evaluate the sparsity mechanism.

Experimental Setup

We explore how well methods to mitigate catastrophic for-

getting scale on ha rd datasets involving ﬁne-grained image

and audio classiﬁcation. These datasets were chosen be-

cause they contain 1) different data modalities (image and

audio), 2) a large number of classes, and 3) a small number

of samples per class. These datasets are more meaningful

(real-world pro blems) and more practical than MNIST. We

also use MNIST to showcase the value of these real-world

datasets. See Table 1 for dataset statistics.

剩余12页未读，继续阅读

翻山越岭只为遇见你

粉丝: 0
资源: 1

深度学习中的灾难性遗忘：测量与对策

神经网络模型中灾难性遗忘研究的综述.pdf

求两个时间序列的滞后相关及其置信水平

灾难性遗忘 mnist

chatglm 灾难性遗忘

深度学习中除了灾难性遗忘还有什么类似的问题

针对表格数据，增量学习中为了避免知识的灾难性遗忘如何使用python代码应用其中

BP神经网络的维度灾难

如何在解决增量学习的灾难性遗忘时使用Minkowski距离，给出具体应用的python代码

使用Minkowski距离减轻增量学习中的灾难性遗忘问题，请给出具体应用的pyhon代码

深度学习中除了灾难性遗忘还有什么类似的问题，请举出10个例子

3000字介绍量子神经网络

分析基于BP神经网络的图像的分类的有效性

bp神经网络求解tsp问题

BP神经网络为什么泛化能力差

bp神经网络当数据特征值过多时准确性下降原因

贝叶斯神经网络的优缺点

正则表达式 灾难性回溯

卷积神经网络图像识别原理

各种函数声明和定义模块

最新资源

正则表达式灾难性回溯