低精度深度卷积神经网络训练策略优化

需积分: 10 49 浏览量更新于2024-09-09 收藏 518KB PDF 举报

本文主要探讨了如何有效地训练低位宽的卷积神经网络(CNN)，这是当前深度学习领域中的一个重要挑战。传统的CNN模型通常使用高精度权重和激活值，但在资源受限的应用场景下，如移动设备或嵌入式系统，低位宽（如4位或更低）的网络是必要的，但这也带来了训练难题，因为低精度可能导致模型容易陷入不良局部最优解，从而显著降低准确率。作者Bohan Zhuang、Chunhua Shen等人提出三种创新的方法来解决这个问题。首先，他们采用两阶段优化策略。在第一阶段，他们先对量化后的权重进行训练，这与传统方法同步优化权重和激活不同，这样做有助于网络逐步找到更优的局部极小值。这样，网络可以从一个相对稳定的基础点开始，减少了陷入低效区域的可能性。第二，他们借鉴第一阶段的思想，提出了渐进式优化策略。这种方法在训练过程中逐步降低位宽，从高精度逐渐过渡到低精度，这样既能保持训练的连续性，又能利用更高精度的模型作为引导，帮助低精度模型更好地收敛。第三，他们引入了一种新颖的学习机制，即同时训练一个全精度模型和一个低精度模型。通过这种方式，全精度模型可以提供指导，帮助低精度模型学习有效的特征表示，从而提升整个网络的性能。这种联合学习策略使得低精度模型在保持性能的同时，还能减少对高精度模型的依赖。在实验部分，作者在CIFAR-100和ImageNet等多样化的数据集上验证了这些方法的有效性。具体结果显示，使用他们的方法训练的4位精度网络，在标准网络架构（如AlexNet和ResNet-50）的表现上并未出现明显性能下降，这对于实际应用来说具有重要的意义。总结来说，本文的关键贡献在于提出了一种系统性的方法来克服低位宽卷积神经网络训练中的困难，通过分阶段优化、渐进量化和联合学习策略，能够在保持模型精度的同时，有效利用硬件资源限制。这对于推动低功耗、高效能的AI技术发展具有深远影响。

Towards Effective Low-bitwidth Convolutional Neural Networks

Bohan Zhuang

1, 2

, Chunhua Shen

1, 2∗

, Mingkui Tan

, Lingqiao Liu

, Ian Reid

1, 2

The University of Adelaide, Australia,

Australian Centre for Robotic Vision

South China University of Technology, China

{bohan.zhuang,chunhua.shen,lingqiao.liu,ian.reid}@adelaide.edu.au, mingkuitan@scut.edu.cn

Abstract

This paper tackles the problem of training a deep con-

volutional neural network with both low-precision weights

and low-bitwidth activations. Optimizing a low-precision

network is very challenging since the training process can

easily get trapped in a poor local minima, which results in

substantial accuracy loss. To mitigate this problem, we pro-

pose three simple-yet-effective approaches to improve the

network training. First, we propose to use a two-stage

optimization strategy to progressively ﬁnd good local min-

ima. Speciﬁcally, we propose to ﬁrst optimize a net with

quantized weights and then quantized activations. This is

in contrast to the traditional methods which optimize them

simultaneously. Second, following a similar spirit of the

ﬁrst method, we propose another progressive optimization

approach which progressively decreases the bit-width from

high-precision to low-precision during the course of train-

ing. Third, we adopt a novel learning scheme to jointly train

a full-precision model alongside the low-precision one. By

doing so, the full-precision model provides hints to guide

the low-precision model training. Extensive experiments

on various datasets (i.e., CIFAR-100 and ImageNet) show

the effectiveness of the proposed methods. To highlight, us-

ing our methods to train a 4-bit precision network leads

to no performance decrease in comparison with its full-

precision counterpart with standard network architectures

(i.e., AlexNet and ResNet-50).

1. Introduction

The state-of-the-art deep neural networks [

9,17,26] usu-

ally involve millions of parameters and need billions of

FLOPs during computation. Those memory and computa-

tional cost can be unaffordable for mobile hardware device

or especially implementing deep neural networks on chips.

To improve the computational and memory efﬁciency, var-

ious solutions have been proposed, including pruning net-

∗

C. Shen is the corresponding author.

work weights [

7, 8], low rank approximation of weights

[

16,34], and training a low-bit-precision network [4,36–38].

In this work, we follow the idea of training a low-precision

network and our focus is to improve the training process

of such a network. Note that in the literature, many works

adopt this idea but only attempt to quantize the weights of

a network while keeping the activations to 32-bit ﬂoating

point [

4, 19, 36, 38]. Although this treatment leads to lower

performance decrease comparing to its full-precision coun-

terpart, it still needs substantial amount of computational re-

source requirement to handle the full-precision activations.

Thus, our work targets the problem of training network with

both low-bit quantized weights and activations.

The solutions proposed in this paper contain three com-

ponents. They can be applied independently or jointly. The

ﬁrst method is to adopt a two-stage training process. At the

ﬁrst stage, only the weights of a network is quantized. Af-

ter obtaining a sufﬁciently good solution of the ﬁrst stage,

the activation of the network is further required to be in

low-precision and the network will be trained again. Essen-

tially, this progressive approach ﬁrst solves a related sub-

problem, i.e., training a network with only low-bit weights

and the solution of the sub-problem provides a good initial

point for training our target problem. Following the similar

idea, we propose our second method by performing pro-

gressive training on the bit-width aspect of the network.

Speciﬁcally, we incrementally train a serial of networks

with the quantization bit-width (precision) gradually de-

creased from full-precision to the target precision. The third

method is inspired by the recent progress of mutual learn-

ing [

35] and information distillation [1, 11, 22, 24, 32]. The

basic idea of those works is to train a target network along-

side another guidance network. For example, The works

in [

1,11,22,24,32] propose to train a small student network

to mimic the deeper or wider teacher network. They add an

additional regularizer by minimizing the difference between

student’s and teacher’s posterior probabilities [

11] or inter-

mediate feature representations [

1, 24]. It is observed that

by using the guidance of the teacher model, better perfor-

mance can be obtained with the student model than directly

7920

下载后可阅读完整内容，剩余8页未读，立即下载

wangcheng510

粉丝: 0
资源: 7

低精度深度卷积神经网络训练策略优化

wx494社区门诊管理系统小程序-php+vue+uniapp.zip（可运行源码+sql文件+文档）

HTML+CSS+JS+JQ+Bootstrap的家具风格趋势展示响应式网页.7z

高分项目，基于Python+OpenCV的实时疲劳驾驶检测系统，内含源码+演示视频+部署教程

Provable-Training-and-Verification-Approaches-Towards-Robust-Neural-Networks:此仓库跟踪流行的可证明的针对健壮的神经网络的训练和验证方法，包括流行数据集的排行榜和论文分类

towards-the-importance-of-noise-in-nueral-networks

【10】Towards End-to-End Speech Recognitionwith Recurrent Neural Networks.pdf

towards-neural-programming-interfaces:与NeurIPS 2020论文相关的代码存储库

Towards A Crowdsourcing-Based Transmission Paradigm in Heterogeneous Networks

Self-Distillation: Towards Efficient and Compact Neural Networks

Face-towards-the-recognition.rar_BP算法_bp face_face neural_neural

最新资源