网络预训练：批次标准化Mlpconv监督的进展

研究论文

4 浏览量更新于2024-07-15 收藏 2.18MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

本文主要探讨了在网络中采用批次标准化（Batch Normalization）的Mlpconv监督预训练网络（Batch-normalized Mlpconv-wise supervised pre-training network）的研究。在当前的深度学习领域，预训练技术已经成为提高模型性能的关键手段之一，特别是在自然语言处理、计算机视觉等复杂任务中。作者 Xiaomeng Han 和 Qun Dai 在《应用智能》国际期刊（Applied Intelligence, International Journal of Research on Intelligent Systems for Real-Life Complex Problems）上发表的文章，发表日期为2018年，卷48，第1期。批次标准化是一种常用的深度学习层，它通过标准化每个批次的输入特征，使得每一层的输入分布更加稳定，有助于加快收敛速度并提高模型的泛化能力。在Mlpconv（Multi-layer Perceptron Convolutional，多层感知器卷积）架构中，这种标准化被应用于网络的不同阶段，可能包括卷积层、全连接层以及非线性变换环节，以确保数据在传递过程中的一致性和稳定性。文章的核心内容可能包括以下几个方面： 1. **理论背景**：首先，解释了批次标准化的基本原理和在深度学习中的作用，以及它如何帮助解决梯度消失或爆炸的问题，从而优化网络的训练过程。 2. **方法介绍**：详述了作者设计的Mlpconv-wise监督预训练网络的具体结构，可能包括如何结合批次标准化来调整权重初始化、激活函数选择以及优化算法，以便于网络在大规模未标注数据上进行自我学习。 3. **实验与评估**：分享了使用这种预训练方法进行实际网络训练的实验结果，包括在不同任务（如图像分类、情感分析等）上的性能对比，以及与传统预训练方法（如VGG、ResNet等）的比较。 4. **优势与局限性**：讨论了该方法的优势，如提高模型鲁棒性、减少过拟合以及在资源有限的情况下提升效率。同时，也可能提到了可能存在的挑战，如对超参数敏感度的管理或对于某些特定问题的适用性。 5. **未来展望**：最后，作者可能对未来的研究方向进行了探讨，例如如何进一步优化批次标准化在Mlpconv网络中的应用，或者与其他技术（如Transformer、自注意力机制）的融合可能性。这篇研究论文深入研究了在深度学习网络中利用批次标准化进行预训练的技术，提供了实用的解决方案，并为进一步提升网络性能和解决实际问题提供了新的思路。

资源详情

资源推荐

Batch-normalized Mlpconv-wise... 143

optimization algorithm based upon a supervised training

criterion is used to fine-tune the deep multilayer neural

network.

The deep learning (DL) paradigm brought about a revival

in deep multilayered neural network research, and has

attracted unprecedented attention because of its success in

several areas, including vision and language recognition [3–

5]. The objective of DL approaches is learning a hierarchical

model from the input characteristics. Lower-layer charac-

teristics in the hierarchical model are combined to form

higher-layer characteristics. Deep learning has been demon-

strated to be able to learn many hierarchical characteristics

automatically, which are then combined within an integrated

network [6].

Although the ability of hierarchical neural networks [7,

8] to learn characteristics is useful for pattern analysis, there

are still many problems to be solved in the DL paradigm. For

example, the characteristics learned in hidden layers are not

always transparent in their meaning, particularly for early

hidden layers, the discrimination ability may occasionally

decrease [9], the fade-away gradient may make it hard to

train a deep network [10], and overfitting may occur when

very little training data is available [11]. Recent techniques

such as dropout [11] and dropconnect [12] are used to reg-

ulate deep networks and avoid overfitting. The idea behind

these techniques is to randomly drop units or connections

to prevent units from co-adapting, which has been shown to

improve classification performance in numerous studies.

Because data generally come from nonlinear manifold

distributions, they are not linearly separable. To realize the

abstraction and acquisition of a larger amount of informa-

tion in the receptive fields, the network in network (NIN)

[13] model uses an mlpconv layer, where a multilayer

perceptron (MLP) convolves the input to enhance the non-

linearity of local patches. Thus, the discrimination ability

of the model is improved. Companion objective functions

are used to constrain the weights in hidden layers in deeply

supervised nets (DSNs) [14], so that robust features can

be captured in the first few layers of a deep convolutional

neural network (CNN).

The disappearance of the gradient is essentially the prob-

lem of gradient shrinkage propagating backwards through

the hidden layers. It is noteworthy that some successful

approaches have used the strategy of adding hidden layers

to networks in a constructive manner [1]. To construc-

tively formulate a desirable internal representation, using

a supervised criterion in each phase provides straightfor-

ward supervision. However, it has been reported that using

a supervised criterion in each phase may be too greedy and

may not obtain as good generalization performance as using

an unsupervised criterion [15]. Another issue is that the data

distribution can vary during the DL procedure. Variations in

the data distribution may result in saturation of the activa-

tion function, shifting input data into the saturation region

of the activation function and reducing the learning speed.

This phenomenon is referred to as internal covariate shift

[16]. Ioffe et al. [17] addressed this issue by applying batch

normalization to the input of every hidden layer.

Changing the bottom layer weights is necessary for back

propagation through many layers, thus resulting in the phe-

nomenon of vanishing gradients. A variety of approaches

and parameter setting methods, such as pre-training, have

been proposed to achieve better training of deep neural

networks. In conventional greedy layer-wise supervised pre-

training methods, each new hidden layer is trained as the

hidden layer of a single-hidden-layer supervised neural net-

work, with the input being the output of the previously

trained layers [8, 15, 18]. The output layer is then discarded,

and the trained hidden layer is used as the pre-training ini-

tialization. It is expected that this approach will yield a

preferable representation. However, the greedy layer-wise

supervised pre-training method may be too greedy. The

learned hidden units representation could neglect some

important information about the learning target when this

information is not easily acquired by a single-hidden-layer

neural network. However, this information could be suc-

cessfully acquired using deep structures. In this paper,

basedontheNIN[13] structure, we present a new DL

approach called mlpconv-wise supervised pre-training NIN

(MPNIN).

The central idea of MPNIN is to use integrated direct

supervised training in the hidden layers, rather than the stan-

dard approach of implementing supervised training only in

the output layer and back propagating this supervision infor-

mation to earlier layers. We implement this integrated direct

hidden layer supervised training by introducing mlpconv-

wise supervised pre-training to each hidden layer. An mlp-

con

v layer consists of a linear convolutional layer and a

two-layered MLP. Each mlpconv layer that is pre-trained

with supervision is used as the hidden layer of a single-

hidden-layer supervised neural network. During the super-

vised pre-training, each new mlpconv layer takes as input

the output of the previously trained mlpconv layers. We

use batch normalization to normalize the inputs and reduce

the effects of internal covariate shift. The output layer is

then discarded, and the trained mlpconv layer is used as

the initialized hidden layer. The experimental results in this

paper verify the robustness and discrimination ability of the

features learned by the proposed MPNIN model.

Our motivations behind the development of the proposed

MPNIN network and the novelty and contributions of this

research can be summarized as follows.

Author's personal copy

剩余15页未读，继续阅读

weixin_38627104

粉丝: 1
资源: 983

网络预训练：批次标准化Mlpconv监督的进展

caffe-SSD网络预训练模型

RoBERTa中文预训练模型.zip

网络中批次标准化Mlpconv预训练网络的智能应用

深度神经网络训练中适用于小批次的归一化算法.pdf

VGG_SSD网络的VOC预训练模型

人工智能-项目实践-预训练-RoBERTa中文预训练模型 RoBERTa for Chinese.zip

GroupNormalizationTF:使用ImageNet上的预训练权重在Tensorflow中进行组归一化ResNet

PyTorch环境下预训练卷积神经网络微调技巧

使用MATLAB GUI轻松实现预训练神经网络和ONNX模型的迁移学习

ChatGPT中的自监督学习与预训练模型

基于BERT的预训练模型在siamese网络中的应用

探究卷积神经网络中的批量标准化技术

批次归一化技术解读：提升训练效果的神器

：神经网络迁移学习：利用预训练模型加速开发（节省时间和资源）

【迁移学习实战宝典】：图像识别领域的预训练网络应用全攻略

Pytorch批次z-score标准化torch代码实现

多台三相逆变器并联（本模型为三台并联，市面上多为两台并联）matlab simulink仿真 功能：实现并联系统中各逆变器输出

最新资源

多台三相逆变器并联（本模型为三台并联，市面上多为两台并联）matlab simulink仿真功能：实现并联系统中各逆变器输出