2019 ICLR：卷积网络中变形稳定性并非必需且不充分

需积分: 0 119 浏览量更新于2024-08-05 收藏 4.54MB PDF 举报

在2019年的国际计算机视觉与模式识别（ICLR）会议上，一篇匿名作者提交的论文《Pooling Is Neither Necessary nor Sufficient for Appropriate Deformation Stability in CNNs》引起了广泛关注。这篇处于双盲评审阶段的论文探讨了深度学习领域的一个核心假设——卷积神经网络（CNN）在解决图像识别任务时，对小规模的平移和变形是否具有内在稳定性。长久以来，为了实现这种稳定性，设计者们倾向于在CNN架构中嵌入交替池化层。然而，随着近年来对这一做法的质疑，该研究提出了新的问题：我们对于网络对变形的稳定性的直觉是否正确？这种稳定性真的重要吗？以及如果没有池化层，网络如何实现变形不变性？论文深入地分析了这些疑问，并得出了一些关键发现： 1. **变形稳定性并非二元属性**：论文指出，不同的图像识别任务可能对网络的变形稳定性有不同的需求。这暗示着一种观点，即单一的稳定度标准并不适用于所有场景，网络应当根据任务的特性灵活调整其对变形的敏感程度。 2. **池化功能的重新评估**：以往普遍认为池化是确保网络对变形不变的基础，但研究表明，这种依赖并非必需。这意味着即使没有池化层，CNN也能通过其他机制实现一定程度的变形不变性。 3. **网络内部机制的探究**：研究者探索了在没有池化的情况下，网络是如何在不同层级上实现不同程度的变形稳定性的。这可能涉及到更复杂的特征学习策略，如卷积层的权重分布、网络结构的设计，或者非局部运算的引入等。 4. **实验验证的重要性**：论文强调了严谨的实验方法在验证这些理论假设中的关键作用。通过对大量数据集进行细致的实验，研究人员得以揭示变形稳定性在实际应用中的真实表现。 5. **对未来的启示**：该研究的结果对未来的CNN设计和优化具有深远影响，可能会推动研究人员重新思考网络架构中的池化设计，寻找更高效且适应性更强的变形不变性解决方案。这篇论文挑战了关于CNN中变形稳定性的传统认识，提出了一个更加精细和动态的观点，并提供了实证证据来支持这一论断。这对于深入理解深度学习模型的工作原理，特别是在处理视觉任务中的变形不变性问题具有重要意义。

Under review as a conference paper at ICLR 2019

(a) (b)

Figure 1: (a)

Generating deformed images

: To randomly deform an image we: (i) Start with a ﬁxed

evenly spaced grid of control points (here 4x4 control points) and then choose a random source for

each control point within a neighborhood of the point; (ii) we then smooth the resulting vector ﬁeld

using thin plate interpolation; (iii) vector ﬁeld overlayed on original image: the value in the ﬁnal

result at the tip of an arrow is computed using bilinear interpolation of values in a neighbourhood

around the tail of the arrow in the original image; (iv) the ﬁnal result. (b)

Examples of deformed

ImageNet images.

left: original images, right: deformed images. While the images have changed

signiﬁcantly, for example under the

metric, they would likely be given the same label by a human.

Empirical investigations.

Previous empirical investigations of these phenomena in CNNs include

the work of Lenc & Vedaldi (2015), which focused on a more limited set of invariances such as

global afﬁne transformations. More recently, there has been interest in the robustness of networks

to adversarial geometric transformations in the work of Fawzi & Frossard (2015) and Kanbak et al.

(2017). In particular, these studies looked at worst-case sensitivity of the output to such transforma-

tions, and found that CNNs can indeed be quite sensitive to particular geometric transformations (a

phenomenon that can be mitigated by augmenting the training sets). However, this line of work does

not address how deformation sensitivity is generally achieved in the ﬁrst place, and how it changes

over the course of training. In addition, these investigations have been restricted to a limited class of

deformations, which we seek to remedy here.

2 METHODS

2.1 DEFORMATION SENSITIVITY

In order to study how CNN representations are affected by image deformations we ﬁrst need a

controllable source of deformation. Here, we choose a ﬂexible class of local deformations of image

coordinates, i.e., maps

τ : R

→ R

such that

k∇τk

∞

< C

for some

, similar to Mallat (2012).

We choose this class for several reasons. First, it subsumes or approximates many of the canonical

forms of image deformation we would want to be robust to, including:

• Pose: Small shifts in pose or location of subparts

• Afﬁne transformations: translation, scaling, rotation or shear

• Thin-plate spline transforms

• Optical ﬂow: Roth & Black (2007); Rosenbaum et al. (2013)

We show examples of several of these in Section 2 of the supplementary material.

This class also allows us to modulate the strength of image deformations, which we deploy to

investigate how task demands are met by CNNs. Furthermore, this class of deformations approximates

most of the commonly used methods of data augmentation for object recognition Simard et al. (2003);

Wong et al. (2016); Cire¸san et al. (2010).

While it is in principle possible to explore ﬁner-grained distributions of deformation (e.g., choosing

adversarial deformations to maximally shift the representation), we think our approach offers good

coverage over the space, and a reasonable ﬁrst order approximations to the class of natural deforma-

tions. We leave the study of richer transformations—such as those requiring a renderer to produce or

those chosen adversarially Fawzi & Frossard (2015); Kanbak et al. (2017)—as future work.

剩余11页未读，继续阅读

三山卡夫卡

粉丝: 24
资源: 323

2019 ICLR：卷积网络中变形稳定性并非必需且不充分

2019-ICLR-Graph Generation via Scattering-作者信息-rrrrr1

2019-ICLR-Confidence-based Graph Convolutional Networks for Semi

2019-ICLR-DEEP GENERATIVE MODELS FOR GENERATING LABELED GRAPHS-R

2019-ICLR-GENERATIVE MODELS FOR GRAPH-BASED PROTEIN DESIGN-RRRR-

2019-ICLR-CAPSULE GRAPH NEURAL NETWORK-网文-rrr1

2019-ICLR-Graph Classification with Geometric Scattering-作者信息-rr

2019-ICLR-Graph Classification with Geometric Scattering-基于矩阵的小波

2019-ICLR-百度-DEEP GEOMETRICAL GRAPH CLASSIFICATION-游走向量化+GNN+图下采

protein-sequence-embedding-iclr2019:“使用来自结构的信息学习蛋白质序列嵌入”的源代码-ICLR 2019-Source code learning

阅读理解-2020-ICLR-Transformer-XH- Multi-Evidence Reasoning with eXt

最新资源