动态神经网络：深度学习的研究进展

机器学习

需积分: 50 182 浏览量更新于2024-07-14 收藏 1.43MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"这篇PDF综述探讨了动态神经网络在深度学习中的应用和发展，主要分为实例级、空间级和时间级动态模型三类，并涵盖了架构设计、决策策略、优化技术及应用等方面的重要研究问题。" 在深度学习领域，动态神经网络（Dynamic Neural Networks, DNN）作为一种新兴的研究主题，正逐渐受到广泛关注。与传统的静态模型相比，动态网络在推理阶段能够根据不同的输入调整其结构或参数，从而在准确度、计算效率和适应性等方面展现出显著优势。 1. 实例级动态模型：这类模型针对每个输入实例采用数据依赖的架构或参数。这种灵活性使得网络能够针对每个样本的特性进行定制化处理，提高了模型对异质数据的适应性，减少了过拟合的可能性。 2. 空间级动态网络：这些网络在处理图像数据时，可以根据不同空间位置进行自适应计算。这在处理具有复杂空间结构的问题时特别有用，例如在图像识别中，不同区域可能需要不同的处理深度或复杂度，空间级动态网络能够更好地捕捉这种差异。 3. 时间级动态模型：这类模型适用于序列数据，如视频和文本，能够在时间维度上进行自适应推理。这在理解动态变化的过程或者序列模式识别中至关重要，能够更有效地捕获时间序列数据中的长期依赖关系。动态网络的研究问题涵盖多个方面： - 架构设计：如何构建既能灵活调整又能保持有效性能的网络结构是核心挑战之一。这涉及到如何设计可变的模块、路径或权重分配机制。 - 决策策略：确定何时以及如何改变网络的结构或参数是一个关键问题。这可能涉及学习决策规则，或者利用强化学习等方法来优化动态过程。 - 优化技术：由于动态网络的非静态性质，优化算法需要能够适应不断变化的结构。这包括如何有效地更新参数，以及如何处理训练与推理阶段的不一致性。 - 应用：动态神经网络已被应用于诸多领域，如自然语言处理、计算机视觉、语音识别和推荐系统等。它们在这些领域的表现往往优于传统静态模型，因为它们能够更好地适应复杂的现实世界任务。随着研究的深入，动态神经网络有望继续推动深度学习的边界，解决现有模型的局限性，并在各种应用场景中实现更好的性能。未来的研究将集中在提高动态网络的效率、泛化能力和可解释性，以满足实际应用的需求。

资源详情

资源推荐

Input Output

Block

Input

Output

(a) Layer skipping based on halting score. (b) Layer skipping based on a gating function.

Gating

Module

... ...

Policy Network

Input

Block Block

Fig. 4. Dynamic layer skipping. The dashed features in (a) are not calculated conditioned on the halting score, and the gating module in (b) decides

whether to execute the layer/block. The extra policy network in (c) directly generates the skipping decisions for all layers in the main network.

c) Multi-scale architecture with early exits. Researchers [12]

have observed that in chain-structured networks, the multi-

ple classiﬁers may interfere with each other, which degrades

the overall performance. A reasonable interpretation could

be that in regular CNNs, the high-resolution features lack

the global information that is essential for classiﬁcation,

leading to unsatisfying results for early exits. Moreover,

early classiﬁers would force the shallow layers to generate

task-specialized features, while a part of general information

is lost, leading to degraded performance for deep exits.

To address this issue, multi-scale dense network (MSDNet)

[12] adopts 1) a multi-scale architecture, to quickly gener-

ate coarse-level features that are suitable for classiﬁcation;

2) dense connections, to reuse early features and improve

the performance of deep classiﬁers (see Fig. 2 (a)). Such

a specially-designed architecture effectively enhances the

overall accuracy of all the classiﬁers in the network.

Besides the architecture design, the exiting policies and

training techniques are also important for the model per-

formance. Apart from the conﬁdence-based criteria in [12],

policy networks are built for the multi-scale dynamic mod-

els with early classiﬁers (see Fig. 2 (b)) [61], [62] to make

decisions on whether each instance should exit. As for

training, speciﬁc techniques are studied in [63] for multi-exit

networks. More discussion about the inference and training

schemes for dynamic models will be reviewed in Sec. 5.

The methods discussed above mostly implement the

early-exiting scheme via depth adaptation. From the per-

spective of exploiting spatial redundancy in features, res-

olution adaptive network (RANet, see Fig. 2 (c)) [30] fur-

ther achieves resolution adaptation with depth adaptation

simultaneously. Speciﬁcally, the network ﬁrst processes each

instance with low-resolution features, while high-resolution

representations are utilized conditioned on the prediction

conﬁdence of early classiﬁers.

2) Layer skipping. In the aforementioned early-exiting

paradigm, the general idea is skipping the execution of all

the deep layers after a certain classiﬁer. More ﬂexibly, the

network depth can also be adapted on the ﬂy by strategi-

cally skipping the calculation of intermediate layers without

placing extra classiﬁers. Given the i-th input instance x

dynamic layer skipping could be generally written as

= (1

◦ F

) ◦ (1

L−1

◦ F

L−1

) ◦ · · · ◦ (1

◦ F

)(x

), (3)

where 1

denotes the indicator function determining the

execution of layer F

, 1 ≤ ` ≤ L. This scheme is typically im-

plemented on structures with skip connections (e.g. ResNet

[4]) to guarantee the continuity of forward propagation, and

here we summarize three representative approaches.

a) The halting score. Adaptive computation time (ACT)

[11] is achieved based on an RNN, where a scalar named

halting score is accumulated as multiple layers are sequen-

tially executed within a time step, and the hidden state of

the RNN will be directly fed to the next step if the score

exceeds a threshold. The ACT method is further extended

to ResNet for vision tasks [31] by viewing residual blocks

within a stage

as linear layers within a step of RNN (see

Fig. 4 (a)). Moreover, the halting score in [31] is allowed

to vary across spatial locations. Rather than skipping the

execution of layers with independent parameters, iterative

and adaptive mobile neural network (IamNN) [64] replaces

multiple residual blocks in each ResNet stage by one block

with shared weights, leading to a signiﬁcant reduction of

parameters. In every stage, the block is executed for an

adaptive number of steps according to the halting score.

In addition to RNNs and CNNs, the halting scheme is

further implemented on Transformers [6] by [33] and [34] to

achieve dynamic network depth on NLP tasks.

b) Gating function. Apart from comparing the calculated

halting scores with certain thresholds as in aforementioned

approaches, gating function is also a prevalent option for

making discrete decisions due to its plug-and-play property.

By generating binary values based on intermediate features,

a gating function can determine the skipping/execution of

a layer (block) on the ﬂy (see Fig. 4 (b)).

Take the layer skipping in ResNet as an example, let x

denote the input feature of the `-th residual block, gating

function G

generates a binary value to determine the exe-

cution of F

. This procedure could be represented by

`+1

= G

) + x

. (4)

SkipNet [45] and convolutional network with adaptive

inference graph (conv-AIG) [46] are two representative ap-

proaches to enabling dynamic layer skipping. Both methods

induce lightweight computational overheads to efﬁciently

produce the binary decisions on whether skipping the calcu-

lation of a residual block. Speciﬁcally, Conv-AIG utilizes two

FC layers in each residual block, while the gating function in

SkipNet is implemented as an RNN for parameter sharing.

Rather than skipping layers in classic ResNets, dynamic

recursive network [65] iteratively executes one block with

shared parameters in each residual stage. Although being

seemingly similar to the aforementioned IamNN [64], its

decision policies differs signiﬁcantly. Without tuning the

threshold for halting scores as IamNN, gating modules are

exploited by [65] to decide the recursion depth.

Instead of either skipping a layer, or executing it thor-

oughly with a full numerical precision, a line of work [66],

[67] studies adaptive bit-width for different layers condi-

tioned on the resource budget. Furthermore, fractional skip-

1. Here we refer to a stage as a stack of multiple residual blocks with

the same feature resolution.

2. For simplicity and without generality, the subscript for sample

index will be omitted in the following.

剩余19页未读，继续阅读

曾荣飞

粉丝: 0
资源: 3

动态神经网络：深度学习的研究进展

DynamicNeuralNetwork:前馈ANN的动态实现，允许进行各种超参数调整

meta-learning in neural networks: a survey

complex-valued neural networks: theories and applications电子版

优化概率神经网络_Bayesian Neural Networks：贝叶斯神经网络

graph neural networks: a review of methods and applications

s. haykin, neural networks: a comprehensive foundation, prentice hall intern

FileNotFoundError: [Errno 2] No such file or directory: 'data.xls'

可以帮我查询一些短期能源模型文献吗

Capsule Networks for Computer Vision: A Survey翻译

graph neural networks in recommender systems: a survey

人工智能深度学习参考文献

深度学习指纹识别 文献

TypeError: 'torch._C.Node' object is not subscriptable

列出图神经网络的最新文献

循环神经网络有哪些经典书籍

关于深度学习的外文文献

有关神经网络表情识别的参考文献十篇

推荐一些关于图神经网络的SCI期刊，至少30个，且列出每个期刊的影响因子

基于BP神经网络的沪深两市大盘资金流预测国外相关文献

最新资源

深度学习指纹识别文献