AdderNet：深度学习中真的需要乘法吗？

需积分: 9 84 浏览量更新于2024-08-05 收藏 714KB PDF 举报

"AdderNet: Do We Really Need Multiplications in Deep Learning? 是一篇CVPR会议的论文，探讨了在深度学习中是否可以避免使用乘法运算，提出了AdderNet，这是一种利用加法来减少计算复杂性的新型网络结构。" 在深度学习领域，乘法操作的计算复杂度远高于加法操作。传统的卷积神经网络（CNNs）广泛用于图像识别等任务，其核心是通过卷积层来衡量输入特征与滤波器之间的相似性，这通常涉及大量浮点数之间的乘法运算。然而，这些乘法运算占据了模型计算的大部分成本，对硬件资源和能源消耗有显著影响。论文作者提出了一种名为AdderNet的网络结构，旨在用更廉价的加法运算替换这些复杂的乘法，从而降低深度学习模型的计算成本。AdderNets的核心思想是在保持模型性能的同时，尽可能减少乘法操作的使用，转而依赖于加法操作来处理信息。这种方法理论上可以显著提高模型的计算效率，尤其是在资源受限的环境中，如边缘计算设备。 AdderNet的设计考虑了两个主要方面：首先，它修改了卷积层的实现，使用加法来近似传统卷积中的乘法操作；其次，为了补偿加法操作可能带来的信息丢失，AdderNet可能需要调整或优化其他层（如激活函数、池化层等），以确保整体网络的训练效果。实验结果显示，AdderNet在多个基准数据集上，如ImageNet，实现了与传统CNNs相当甚至更好的性能，同时大幅减少了计算量。这表明，在不牺牲模型性能的前提下，通过AdderNet可以有效地降低深度学习模型的计算复杂性。此外，由于AdderNet侧重于加法操作，它对硬件的优化潜力也很大。例如，可以设计专门针对加法运算的硬件加速器，进一步提高推理速度，降低能耗。这对于嵌入式设备和物联网应用来说尤其重要，因为它们通常受限于有限的计算资源和功耗。 AdderNet为深度学习提供了一种新的优化视角，挑战了传统上认为乘法运算不可或缺的观点。通过引入加法为主的计算方式，AdderNet不仅有助于提高模型的运行效率，还可能开启深度学习硬件设计的新方向，推动整个领域的持续发展。

AdderNet: Do We Really Need Multiplications in Deep Learning?

Hanting Chen

1,2∗

, Yunhe Wang

2∗

, Chunjing Xu

2†

, Boxin Shi

3,4

, Chao Xu

, Qi Tian

, Chang Xu

Key Lab of Machine Perception (MOE), Dept. of Machine Intelligence, Peking University.

Noah’s Ark Lab, Huawei Technologies.

NELVT, Dept. of CS, Peking University.

Peng Cheng Laboratory.

School of Computer Science, Faculty of Engineering, The University of Sydney.

{htchen, shiboxin}@pku.edu.cn, xuchao@cis.pku.edu.cn, c.xu@sydney.edu.au

{yunhe.wang, xuchunjing, tian.qi1}@huawei.com

Abstract

Compared with cheap addition operation, multiplication

operation is of much higher computation complexity. The

widely-used convolutions in deep neural networks are ex-

actly cross-correlation to measure the similarity between

input feature and convolution ﬁlters, which involves mas-

sive multiplications between ﬂoat values. In this paper, we

present adder networks (AdderNets) to trade these massive

multiplications in deep neural networks, especially convo-

lutional neural networks (CNNs), for much cheaper addi-

tions to reduce computation costs. In AdderNets, we take

the ℓ

-norm distance between ﬁlters and input feature as

the output response. The inﬂuence of this new similarity

measure on the optimization of neural network have been

thoroughly analyzed. To achieve a better performance, we

develop a special back-propagation approach for Adder-

Nets by investigating the full-precision gradient. We then

propose an adaptive learning rate strategy to enhance the

training procedure of AdderNets according to the mag-

nitude of each neuron’s gradient. As a result, the pro-

posed AdderNets can achieve 74.9% Top-1 accuracy 91.7%

Top-5 accuracy using ResNet-50 on the ImageNet dataset

without any multiplication in convolutional layer. The

codes are publicly available at: https://github.com/huawei-

noah/AdderNet.

1. Introduction

Given the advent of Graphics Processing Units (GPUs),

deep convolutional neural networks (CNNs) with billions

of ﬂoating number multiplications could receive speed-ups

and make important strides in a large variety of computer

vision tasks, e.g. image classiﬁcation [

26, 17], object de-

tection [

23], segmentation [19], and human face veriﬁca-

∗

Equal contribution.

†

Corresponding author.

tion [

32]. However, the high-power consumption of these

high-end GPU cards (e.g. 250W+ for GeForce RTX 2080

Ti) has blocked modern deep learning systems from being

deployed on mobile devices, e.g. smart phone, camera, and

watch. Existing GPU cards are far from svelte and cannot

be easily mounted on mobile devices. Though the GPU it-

self only takes up a small part of the card, we need many

other hardware for supports, e.g. memory chips, power cir-

cuitry, voltage regulators and other controller chips. It is

therefore necessary to study efﬁcient deep neural networks

that can run with affordable computation resources on mo-

bile devices.

Addition, subtraction, multiplication and division are the

four most basic operations in mathematics. It is widely

known that multiplication is slower than addition, but most

of the computations in deep neural networks are multiplica-

tions between ﬂoat-valued weights and ﬂoat-valued activa-

tions during the forward inference. There are thus many pa-

pers on how to trade multiplications for additions, to speed

up deep learning. The seminal work [

5] proposed Bina-

ryConnect to force the network weights to be binary (e.g.-1

or 1), so that many multiply-accumulate operations can be

replaced by simple accumulations. After that, Hubara et

al. [

15] proposed BNNs, which binarized not only weights

but also activations in convolutional neural networks at run-

time. Moreover, Rastegari et al. [

22] introduced scale fac-

tors to approximate convolutions using binary operations

and outperform [15, 22] by large margins. Zhou et al. [39]

utilized low bit-width gradient to accelerate the training

of binarized networks. Cai et al. [

4] proposed an half-

wave Gaussian quantizer for forward approximation, which

achieved much closer performance to full precision net-

works.

Though binarizing ﬁlters of deep neural networks sig-

niﬁcantly reduces the computation cost, the original recog-

nition accuracy often cannot be preserved. In addition,

the training procedure of binary networks is not stable and

usually requests a slower convergence speed with a small

1468

下载后可阅读完整内容，剩余9页未读，立即下载

DeepLearning小舟

粉丝: 2444
资源: 57

AdderNet：深度学习中真的需要乘法吗？

动态规划矩阵连乘算法.pdf

multiplications

On the memory wall and performance of symmetric sparse matrix multiplications in different data structures on shared memeory machines

Multiplications:儿童乘法程序-开源

Domain-Specific Languages in R--2018

Matrix-Chain_JAVA.rar_Matrix-chain java_java programming

儿童乘法学习开源程序Multiplications介绍

RuntimeWarning: overflow encountered in add

最新资源