深度学习新突破：动态卷积重塑图像分类

需积分: 10 65 浏览量更新于2024-09-11 收藏 991KB PDF 举报

"本文介绍了Active Convolution Unit（ACU），一种用于图像分类的新型卷积单元，旨在学习卷积的形状。ACU通过反向传播在训练过程中可学习其形状，从而扩展了传统卷积的范围，包括分数像素坐标下的卷积。这种灵活性使CNN结构设计有了更大的自由度。" 在计算机视觉领域，深度学习已经取得了显著的成功，尤其是用于图像分类任务。卷积神经网络（CNNs）作为主要方法，受到了广泛的关注。尽管研究人员已经开发出如Inception和Residual Networks等复杂的网络架构，但对CNN的核心——卷积层本身的改进研究相对较少。本文提出的Active Convolution Learning针对这一问题，聚焦于卷积单元本身，引入了一种称为Active Convolution Unit（ACU）的新概念。传统的卷积操作具有固定的形状，通常由一组固定的滤波器或权重组成，这些滤波器在输入数据上滑动并进行计算。然而，ACU的核心思想是卷积形状可变，这允许我们定义任何形式的卷积。这种灵活性源于ACU在训练过程中通过反向传播学习其形状的能力，打破了卷积必须是规则网格的限制。这不仅涵盖所有常规的整数像素坐标卷积，还扩展到了分数像素坐标的卷积操作。由于ACU的这种特性，我们可以自由地调整卷积的形状，为构建CNN架构提供了更大的设计空间。这种自由度的增加可能会导致更优化的特征提取，有助于提升模型的性能和泛化能力。此外，ACU还可以适应不同的图像特征，比如在处理不规则或者非局部信息时可能更加有效。 Active Convolution Learning是卷积神经网络的一个创新性进展，它将卷积的操作从固定模式转变为动态学习，有望在图像识别、物体检测等任务中实现更好的性能。通过学习和适应卷积的形状，CNN可以更好地捕捉图像中的复杂模式，进一步推动深度学习在计算机视觉领域的应用和发展。

Active Convolution: Learning the Shape of Convolution for Image Classiﬁcation

Yunho Jeon

EE, KAIST

jyh2986@kaist.ac.kr

Junmo Kim

EE, KAIST

junmo.kim@kaist.ac.kr

Abstract

In recent years, deep learning has achieved great suc-

cess in many computer vision applications. Convolutional

neural networks (CNNs) have lately emerged as a major ap-

proach to image classiﬁcation. Most research on CNNs thus

far has focused on developing architectures such as the In-

ception and residual networks. The convolution layer is the

core of the CNN, but few studies have addressed the convo-

lution unit itself. In this paper, we introduce a convolution

unit called the active convolution unit (ACU). A new convo-

lution has no ﬁxed shape, because of which we can deﬁne

any form of convolution. Its shape can be learned through

backpropagation during training. Our proposed unit has a

few advantages. First, the ACU is a generalization of convo-

lution; it can deﬁne not only all conventional convolutions,

but also convolutions with fractional pixel coordinates. We

can freely change the shape of the convolution, which pro-

vides greater freedom to form CNN structures. Second, the

shape of the convolution is learned while training and there

is no need to tune it by hand. Third, the ACU can learn

better than a conventional unit, where we obtained the im-

provement simply by changing the conventional convolution

to an ACU. We tested our proposed method on plain and

residual networks, and the results showed signiﬁcant im-

provement using our method on various datasets and archi-

tectures in comparison with the baseline.

1. Introduction

Following the success of deep learning in the ImageNet

Large Scale Visual Recognition Challenge (ILSVRC) [20],

the best performance in classiﬁcation competitions has al-

most invariably been achieved on convolutional neural net-

work (CNN) architectures. AlexNet [16] is composed of

three types of receptive ﬁeld convolutions (3 × 3, 5 × 5,

11 × 11). VGG [21] is based on the idea that a stack of

two convolutional layers with a receptive ﬁeld 3 × 3 is more

effective than a 5 × 5 convolution. GoogleNet [24, 25, 26]

introduced an Inception layer for the composition of various

receptive ﬁelds. The residual network [10, 11, 29], which

adds shortcut connections to implement identity mapping,

allows more layers to be stacked without running into the

gradient vanishing problem. Recent research on CNNs has

mostly focused on composing layers rather than the convo-

lution itself.

Other basic units, such as activation and pooling units,

have been studied with many variations. Sigmoid [7] and

tanh were the basic activations for the very ﬁrst neural net-

work. The rectiﬁed linear unit (ReLU) [19] was suggested

to overcome the gradient vanishing problem, and achieved

good results without pre-training. Since then, many vari-

ants of ReLUs has been suggested, such as the leaky ReLU

(LReLU) [18], randomized LReLU [27], parametric ReLU

[9], and exponential linear unit [3]. Other types of activa-

tion units have been suggested to learn subnetworks, such

as Maxout [5] and local winner-take-all [23].

Pooling is another basic operation in a CNN to reduce the

resolution and enable translation invariance. Max and aver-

age pooling are the most popular methods. Spatial pyramid

pooling [8] was introduced to deal with inputs of varying

resolution. The ROI pooling method was used to speed up

detection [4]. Recently, fractional pooling [6] has been ap-

plied to image classiﬁcation. Lee et al. [17] proposed a gen-

eral pooling method that combines pooling and convolution

units. On the other hand, Springenberg et al. [22] showed

that using only convolution units is sufﬁcient without any

pooling.

However, only a few studies have considered convolu-

tion units themselves. Dilated convolution [1, 28] has been

suggested for dense prediction of segmentation. It reduces

post-processing to enhance the resolution of the segmented

result. Permutohedral lattice convolution [14] is used to

expand the convolved dimension from the spatial domain

to the color domain. It enables pairwise potentials to be

learned for conditional random ﬁelds.

In this paper, we propose a new convolution unit. Unlike

conventional convolution and its variants, this unit does not

have a ﬁxed shape of the receptive ﬁeld, and can be used

to take more diverse forms of receptive ﬁelds for convolu-

tions. Moreover, its shape can be learned during the train-

ing procedure. Since the shape of the unit is deformable and

arXiv:1703.09076v1 [cs.CV] 27 Mar 2017

下载后可阅读完整内容，剩余8页未读，立即下载

人工智能的弱者

粉丝: 8
资源: 24

深度学习新突破：动态卷积重塑图像分类

Deeplab v2.docx

deeplab v2 list

deformable convolution

tiled convolution

Very Deep Convolutional Networks for Large-Scale Image Recognition" by Karen Simonyan and Andrew Zisserman (2014)

Write a Style transfer program based on convolution neural network with python, and save the training weight in a file separately.

BasicConv2d

from keras.layers.convolutional import

employed 9×9 convolution kernels to replace part of the 3×3 convolution kernels

最新资源