深度学习新纪元：分形网络挑战现代CNN

需积分: 16 3 浏览量更新于2024-09-09 收藏 255KB PDF 举报

"本文介绍了FractalNet，一种基于自相似性的深度神经网络设计策略，它可以构建出没有残差连接的超深层网络。与当前流行的残差网络不同，FractalNet中的内部信号都经过滤波器和非线性变换，而不是直接传递。尽管如此，实验表明这种非残差结构在极深的卷积神经网络中同样有效，甚至在CIFAR-100数据集上达到了与残差网络相当的性能。" 正文：在深度学习领域，尤其是卷积神经网络（CNN）中，网络的深度对模型的性能有着显著的影响。更深层次的网络能够学习到更复杂的特征，但也伴随着训练难度增加的问题。传统的深度网络设计通常会遇到梯度消失或爆炸的问题，这使得训练非常深的网络变得困难。为了解决这个问题，近年来研究者们提出了许多解决方案，其中最著名的就是Residual Networks（残差网络），它通过引入跳过连接（skip connections）使网络能够学习到“残差”，从而有效地解决了训练深度网络的难题。然而，"Ultra Deep Learning Network"这个主题下提到的FractalNet提出了一种新的思路。FractalNet的设计灵感来源于分形理论，网络结构由单一的扩展规则重复应用生成，形成一个截断的分形结构。这样的网络包含有不同长度的交互子路径，但没有传统的残差连接。每个内部信号都会经过过滤器和非线性操作，然后再传递给后续层，这意味着在FractalNet中，训练过程不是简单地学习残差，而是学习整个特征变换。论文的实验部分对比了FractalNet与残差网络的性能。结果表明，尽管FractalNet没有采用残差学习策略，它在CIFAR-100数据集上的错误率达到了22.85%，与当时最先进的残差网络表现相当。这一发现挑战了残差结构是深度学习成功的关键这一观点，证明了深度网络的性能可以通过不同的架构设计策略来优化。 FractalNet的出现，不仅展示了深度学习网络设计的多样性，也为构建极深网络提供了新的视角。它的分形结构可能有助于更好地捕捉图像和其他数据的多尺度特性，并且由于没有残差连接，可能避免了一些与残差学习相关的潜在问题。未来的研究可能会探索如何进一步优化这种结构，以提高其在各种任务上的性能，以及深入理解为什么分形结构可以有效地支持深度学习。

FractalNet: Ultra-Deep Neural Networks without Residuals

Gustav Larsson

University of Chicago

larsson@cs.uchicago.edu

Michael Maire

TTI Chicago

mmaire@ttic.edu

Gregory Shakhnarovich

TTI Chicago

greg@ttic.edu

Abstract

We introduce a design strategy for neural network macro-architecture based on self-

similarity. Repeated application of a single expansion rule generates an extremely

deep network whose structural layout is precisely a truncated fractal. Such a

network contains interacting subpaths of different lengths, but does not include

any pass-through connections: every internal signal is transformed by a ﬁlter and

nonlinearity before being seen by subsequent layers. This property stands in stark

contrast to the current approach of explicitly structuring very deep networks so that

training is a residual learning problem. Our experiments demonstrate that residual

representation is not fundamental to the success of extremely deep convolutional

neural networks. A fractal design achieves an error rate of 22.85% on CIFAR-100,

matching the state-of-the-art held by residual networks.

Fractal networks exhibit intriguing properties beyond their high performance. They

can be regarded as a computationally efﬁcient implicit union of subnetworks of

every depth. We explore consequences for training, touching upon connection

with student-teacher behavior, and, most importantly, demonstrating the ability to

extract high-performance ﬁxed-depth subnetworks. To facilitate this latter task, we

develop drop-path, a natural extension of dropout, to regularize co-adaptation of

subpaths in fractal architectures. With such regularization, fractal networks exhibit

an anytime property: shallow subnetworks provide a quick answer, while deeper

subnetworks, with higher latency, provide a more accurate answer.

1 Introduction

ResNet [

] is a recent and dramatic increase in both depth and accuracy of convolutional neural

networks, facilitated by constraining the network to learn residuals. ResNet variants [

] and

related architectures [

] employ the common technique of initializing and anchoring, via a pass-

through channel, a network to the identity function. Training now differs in two respects. First, the

objective changes to learning residual outputs, rather than unreferenced absolute mappings. Second,

these networks exhibit a type of deep supervision [

], as near-identity layers effectively reduce

distance to the loss. He et al. [8] speculate that the former, residual formulation itself, is crucial.

We show otherwise, by constructing a competitive extremely deep architecture that does not rely on

residuals. Our design principle is pure enough to communicate in a single word, fractal, and a simple

diagram (Figure 1). Yet, fractal networks implicitly recapitulate many properties hard-wired into

previous successful architectures. Deep supervision not only arises automatically, but also drives a

type of student-teacher learning [

] internal to the network. Modular building blocks of other

designs [32, 20] are almost special cases of a fractal network’s nested substructure.

For fractal networks, simplicity of training mirrors simplicity of design. A single loss, attached to the

ﬁnal layer, sufﬁces to drive internal behavior mimicking deep supervision. Parameters are randomly

initialized. As they contain subnetworks of many depths, fractal networks are robust to choice of

overall depth; make them deep enough and training will carve out a useful assembly of subnetworks.

arXiv:1605.07648v1 [cs.CV] 24 May 2016

下载后可阅读完整内容，剩余8页未读，立即下载

历史高峰

粉丝: 1
资源: 4

深度学习新纪元：分形网络挑战现代CNN

Deep Learning Book Chinese Translation去水印

classification papers for deep learning

Machine Learning for Wireless Networks

Signal Modulation and Demodulation Techniques in MATLAB: A Deep Dive

Vue2 全家桶 + Vant 搭建大型单页面商城项目 新蜂商城前床分离版本-前端Vue 项目源码.zip

【创新未发表】基于matlab沙猫群算法SCSO-PID控制器优化【含Matlab源码 9671期】.zip

基于MySQL+Spark+Echarts+SpringBoot的豆瓣电影数据可视化项目源码+文档说明

vue chrome 扩展模板.zip

白鹭群算法ESOA优化TCN-BiLSTM-Multihead-Attention光伏预测Matlab 9572期.zip

黑猩猩算法Chimp优化TCN-BiLSTM-Multihead-Attention光伏预测Matlab 9589期.zip

最新资源

Vue2 全家桶 + Vant 搭建大型单页面商城项目新蜂商城前床分离版本-前端Vue 项目源码.zip