提升深度学习性能：N-折叠特征映射降噪法

116 浏览量更新于2024-08-28 收藏 536KB PDF 举报

本文标题"N-fold Superposition: Improving Neural Networks by Reducing the Noise in Feature Maps"探讨了在深度学习领域，尤其是卷积神经网络（CNN）中，全连接层（FC）的局限性如何影响模型性能。作者Yang Liu、Qiang Qu和Chao Gao提出了一个创新方法，旨在通过减少特征映射（Feature Maps, FMs）中的噪声来增强CNN与FC层之间的耦合，从而提升网络的整体表现。该研究方法的核心在于将原有的FMs分为n个等份，每个部分被称为一个块。首先，所有块中的相同位置特征进行加权求和，形成新的FM块。这种操作有助于降低单个FM对隐藏层特定权重部分的影响，从而减轻过拟合现象。通过这种方法，作者试图平衡网络的全局优化，使得损失函数的最小值范围更宽，使得训练过程更加稳定，更容易找到全局最优解。值得注意的是，论文引用费马引理（Fermat's Lemma）作为理论支持，证明了N-fold Superposition方法可以拓宽优化空间，有利于神经网络的训练和泛化能力的提高。这种方法的实施不仅减少了噪声干扰，还通过重复和合并新FM块的方式，增强了FC层对输入数据的处理能力，从而整体提升了CNN的性能。 N-fold Superposition方法提供了一种策略，通过优化特征映射处理和FC层的交互，有效地对抗噪声，增强模型的鲁棒性和泛化能力，对于提升深度学习模型在复杂任务中的表现具有显著的意义。这对于那些依赖于大量数据和高精度预测的场景，如图像识别、自然语言处理等领域具有实际应用价值。

N-fold Superposition: Improving Neural Networks by

Reducing the Noise in Feature Maps

Yang Liu, Qiang Qu, Chao Gao

National Digital Switching System Engineering & Technological R&D Center, Zhengzhou 450002, China

fabyangliu@hotmail.com

Abstract—Considering the use of Fully Connected (FC) layer

limits the performance of Convolutional Neural Networks

(CNNs), this paper develops a method to improve the coupling

between the convolution layer and the FC layer by reducing the

noise in Feature Maps (FMs). Our approach is divided into three

steps. Firstly, we separate all the FMs into n blocks equally. Then,

the weighted summation of FMs at the same position in all blocks

constitutes a new block of FMs. Finally, we replicate this new

block into n copies and concatenate them as the input to the FC

layer. This sharing of FMs could reduce the noise in them

apparently and avert the impact by a particular FM on the

specific part weight of hidden layers, hence preventing the

network from overfitting to some extent. Using the Fermat

Lemma, we prove that this method could make the global

minima value range of the loss function wider, by which makes it

easier for neural networks to converge and accelerates the

convergence process. This method does not significantly increase

the amounts of network parameters (only a few more coefficients

added), and the experiments demonstrate that this method could

increase the convergence speed and improve the classification

performance of neural networks.

Keywords—Convolutional Neural Networks; deep learning;

image classification; n-fold superposition; feature map sharing;

hidden layer weight sharing

I. INTRODUCTION

Neural networks especially Convolutional Neural Networks

(CNNs), have shown their remarkable performance in diverse

domains of computer vision [1]. The reason for such

impressive success CNNs have achieved mainly due to its

specially designed network structure, e.g., convolution, pooling.

Feature Maps (FMs), the abstract representation of the input

image, could be obtained by the convolution operation. With

the stack of convolution layers, high-level abstract feature

representation can be secured to understand and identify the

input image [2].

However, the use of coupling between the convolution

layer and the fully connected (FC) layer is the main reason

conventional CNNs overfits to the data or easily trapped in

local minima, with poor predictions [3, 4]. Many methods have

been developed in recent years to address these problems and

improve the performance of CNNs. These methods mainly

focus on the modifying of the network structure and

regularization strategies.

Reference [4-6] replace the conventional convolution

structure with a more vigorous approximation of a nonlinear

function, which helps convolution layers capture a higher level

of abstraction. However, this increases the amounts of network

parameters and computational complexity. The pooling layer is

usually used to abstract FMs to reduce overfitting, and the

commonly used pooling methods are max pooling and average

pooling [7]. Global average pooling [4] has been successfully

used in most fairly known convolutional neural networks [8-

10]. This method sums out the spatial information of each FM,

therefore, reinforces similarities and meanwhile reduce

differences in the spatial information, and makes it more robust

to spatial translations of the input (which is very desirable).

Reference [4] even tried to replace FC layer with this method

to erase the effect of FC layer on the classification performance.

But this makes it harder for neural networks to learn thus slow

down the convergence process.

Softmax loss function is wildly used in most CNNs [11];

nevertheless, it is biased to the sample distribution, so it also

becomes a major improvement goal for researchers. By adding

a decision variable to softmax loss, the loss function could be

explicitly encouraged intra-class compactness and inter-class

separability between learned features, with avoiding overfitting

[12]. However, this method needs repeated fine-tuning which

makes the training difficult. Regularization term could also be

added as a constraint to the loss function to prevent the model

from overfitting, such as squared L

norm constraint on the

weight [13]. This kind of regularization method intends to

smaller the network weights and make the model simple to

reduce overfitting. Dropout [14, 15] randomly drop units

(along with their connections) from the network during training.

Dropout forces the randomly selected neurons to work together

to prevent units from co-adapting too much and improves the

generalization ability of models.

Other strategies, e.g., batch normalization [16] reduces the

internal-covariate-shift by normalizing input distributions of

every layer to the standard Gaussian distribution. Initialization

methods [17, 18] derive more robust initialization method that

particularly considers the nonlinearities in neural networks.

Data augmentation methods [19, 20] could make the model

more robust and prevent overfitting when the training set is

limited. These methods also provide us with valid attempts to

improve the performance of neural networks, besides the

modification of the network structure and regularization.

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38678406

粉丝: 5
资源: 948

提升深度学习性能：N-折叠特征映射降噪法

Neural Bellman-Ford Networks A General Graph Neural Network Fra

libsvm-mat-2[1].89-3[FarutoUltimate3.0Mcode]工具包

tex-fold-linebreaks:在Emacs中的TeX-fold-mode中的句子后折叠换行符的次要模式

DFT的matlab源代码-fold2Bloch-Wien2k:第一性原理电子带结构的展开

matlab代码续行-vim-matlab-fold:Matlab文件的Vim代码折叠

细节增强的matlab代码-fOLD-public:折公

K-Fold Cross Validation：K-Fold Cross Validation for Binary Classification，使用LibSVM-matlab开发

using-fold-to-ascii-browserified:学习使用浏览器和工具将字符折叠为ASCII

lovelace-fold-entity-row:entities实体卡的可折叠行，包含其他行

scala-fold:可折叠

最新资源