SqueezeNet：0.5MB模型实现AlexNet精度，参数减少50倍

需积分: 50 53 浏览量更新于2024-09-10 收藏 903KB PDF 举报

SqueezeNet是2017年在国际计算机视觉与模式识别会议（ICLR）上提交的一篇未审稿论文，其主要目标是提出一种新型的小型卷积神经网络（CNN）架构，能够在保持与AlexNet相当的准确性的同时，极大地减少参数数量和模型大小，达到50倍参数压缩和小于0.5MB的模型体积。这对于深度学习领域具有重要意义，尤其是在资源有限的设备如分布式服务器、自动驾驶汽车中的实时部署，以及对内存约束的FPGA硬件上。 SqueezeNet的设计旨在解决传统CNN在追求高精度时面临的挑战，特别是对于那些对计算效率和内存使用有严格限制的应用场景。它通过以下几个关键创新实现了这一目标： 1. **高效结构**：SqueezeNet采用了轻量级的网络设计，通过减少网络中的过滤器数量、使用更小的滤波器尺寸（例如1x1和3x3）以及移除某些层（如全连接层），显著减少了参数数量。这种设计使模型在保持准确性的前提下，减少了模型的复杂性和计算负担。 2. **瓶颈层（Fire Module）**：SqueezeNet的核心创新是引入了称为“Fire Module”的结构，该模块包含一个squeeze层（1x1卷积，用于降低维度）和一个expand层（1x1和3x3卷积的组合，用于恢复特征空间）。这种模块结构有效地利用了1x1卷积的参数效率，同时保持了必要的特征表达能力。 3. **参数共享**：SqueezeNet在Fire Module中采用参数共享策略，使得多个通道之间的1x1卷积权重可以复用，进一步减小了模型的大小和计算成本。 4. **量化和低精度计算**：论文还探讨了如何通过量化和使用低精度数据类型进行计算，进一步降低模型的内存占用和计算效率，而这对边缘设备特别重要。 5. **实际性能评估**：SqueezeNet在ImageNet数据集上的实验结果表明，尽管模型规模缩小，但其在图像分类任务上的性能与AlexNet相当，这证明了其在保持准确性的同时实现了高效的资源利用。通过SqueezeNet的设计和实现，研究者们展示了即使在资源受限的环境中，也能实现与大型网络相媲美的性能，这对于推动深度学习技术在嵌入式、移动和物联网等领域的应用具有重要的实践价值。

Under review as a conference paper at ICLR 2017

Perhaps the mostly widely studied CNN macroarchitecture topic in the recent literature is the impact

of depth (i.e. number of layers) in networks. Simoyan and Zisserman proposed the VGG (Simonyan

& Zisserman, 2014) family of CNNs with 12 to 19 layers and reported that deeper networks produce

higher accuracy on the ImageNet-1k dataset (Deng et al., 2009). K. He et al. proposed deeper CNNs

with up to 30 layers that deliver even higher ImageNet accuracy (He et al., 2015a).

The choice of connections across multiple layers or modules is an emerging area of CNN macroar-

chitectural research. Residual Networks (ResNet) (He et al., 2015b) and Highway Networks (Sri-

vastava et al., 2015) each propose the use of connections that skip over multiple layers, for example

additively connecting the activations from layer 3 to the activations from layer 6. We refer to these

connections as bypass connections. The authors of ResNet provide an A/B comparison of a 34-layer

CNN with and without bypass connections; adding bypass connections delivers a 2 percentage-point

improvement on Top-5 ImageNet accuracy.

2.4 NEURAL NETWORK DESIGN SPACE EXPLORATION

Neural networks (including deep and convolutional NNs) have a large design space, with numerous

options for microarchitectures, macroarchitectures, solvers, and other hyperparameters. It seems

natural that the community would want to gain intuition about how these factors impact a NN’s

accuracy (i.e. the shape of the design space). Much of the work on design space exploration (DSE)

of NNs has focused on developing automated approaches for ﬁnding NN architectures that deliver

higher accuracy. These automated DSE approaches include bayesian optimization (Snoek et al.,

2012), simulated annealing (Ludermir et al., 2006), randomized search (Bergstra & Bengio, 2012),

and genetic algorithms (Stanley & Miikkulainen, 2002). To their credit, each of these papers pro-

vides a case in which the proposed DSE approach produces a NN architecture that achieves higher

accuracy compared to a representative baseline. However, these papers make no attempt to provide

intuition about the shape of the NN design space. Later in this paper, we eschew automated ap-

proaches – instead, we refactor CNNs in such a way that we can do principled A/B comparisons to

investigate how CNN architectural decisions inﬂuence model size and accuracy.

In the following sections, we ﬁrst propose and evaluate the SqueezeNet architecture with and with-

out model compression. Then, we explore the impact of design choices in microarchitecture and

macroarchitecture for SqueezeNet-like CNN architectures.

3 SQUEEZENET: PRESERVING ACCURACY WITH FEW PARAMETERS

In this section, we begin by outlining our design strategies for CNN architectures with few param-

eters. Then, we introduce the Fire module, our new building block out of which to build CNN

architectures. Finally, we use our design strategies to construct SqueezeNet, which is comprised

mainly of Fire modules.

3.1 ARCHITECTURAL DESIGN STRATEGIES

Our overarching objective in this paper is to identify CNN architectures that have few parameters

while maintaining competitive accuracy. To achieve this, we employ three main strategies when

designing CNN architectures:

Strategy 1. Replace 3x3 ﬁlters with 1x1 ﬁlters. Given a budget of a certain number of convolution

ﬁlters, we will choose to make the majority of these ﬁlters 1x1, since a 1x1 ﬁlter has 9X fewer

parameters than a 3x3 ﬁlter.

Strategy 2. Decrease the number of input channels to 3x3 ﬁlters. Consider a convolution layer

that is comprised entirely of 3x3 ﬁlters. The total quantity of parameters in this layer is (number of

input channels) * (number of ﬁlters) * (3*3). So, to maintain a small total number of parameters

in a CNN, it is important not only to decrease the number of 3x3 ﬁlters (see Strategy 1 above), but

also to decrease the number of input channels to the 3x3 ﬁlters. We decrease the number of input

channels to 3x3 ﬁlters using squeeze layers, which we describe in the next section.

Strategy 3. Downsample late in the network so that convolution layers have large activation

maps. In a convolutional network, each convolution layer produces an output activation map with

a spatial resolution that is at least 1x1 and often much larger than 1x1. The height and width of

these activation maps are controlled by: (1) the size of the input data (e.g. 256x256 images) and (2)

剩余12页未读，继续阅读

Phoenixtree_DongZhao

粉丝: 2106

SqueezeNet：0.5MB模型实现AlexNet精度，参数减少50倍

SqueezeNet在Fashion-MNIST上的MATLAB实现

"全面升级军用标准量规：MIL-STD-114C.016108.PDF详解

k-means算法详解与评估指标：F1-score、Accuracy与NMI

Time Series Forecasting with Ensemble Learning: Expert Guide to Enhancing Accuracy

Optimizing Time Series Forecasting Models: Unveiling Grid Search and Cross-Validation Techniques

Foundation of Constructing Predictive Models, Enhancing Model Accuracy

Challenges and Solutions for Multi-Label Classification Problems: 5 Strategies to Help You Overcome ...

MATLAB Global Optimization Algorithms: An Advanced Journey of Exploration and Practice

: Demystifying the Principles of Generative Adversarial Networks (GANs): Essential Basics and ...

5 Key Tips for Cross-Validation: Unleash More Accurate Machine Learning Models

最新资源