arXiv:1611.01972v2 [cs.CV] 29 Aug 2017
Fixed-point Factorized Networks
Peisong Wang
1,2
and Jian Cheng
1,2,3∗
1
Institute of Automation, Chinese Academy of Sciences
2
University of Chinese Academy of Sciences
3
Center for Excellence in Brain Science and Intelligence Technol o gy, CAS
{peisong.wang, jcheng}@nlpr.ia.ac.cn
Abstract
In recent years, Deep Neural Networks (DNN) based
methods have achieved remarkable performance in a wide
range of tasks and have b een among the most powerful
and widely used techniques in computer vision. However,
DNN-based methods are both co m putational-intensive and
resource-consuming, which hinders the application of these
methods on embedded systems like smart phones. To alle-
viate this problem, we introduce a novel Fixed-point Fac-
torized Networks (FFN) for pretrained models to reduce
the computational complexity as well as the storage re-
quirement of networks. The resulting networks have only
weights of -1, 0 and 1, which significantly eliminate s the
most resource-consu ming multiply-accum ulate operations
(MACs). Extensive experiments on large-scale ImageNet
classification task show the proposed FFN only requires
one-thousandth of multiply operations with comparable ac-
curacy.
1. Introduction
Deep neu ral networks (DNNs) have recently been set-
ting new state of the art perf ormance in many fields in-
cluding computer vision, speech recognitio n as well as nat-
ural lang uage processing . Convolutional neural networks
(CNNs), in par ticular, have outperforme d tra ditional ma-
chine learning algorith ms on co mputer vision tasks such
as image recognition, object detection, semantic segmenta-
tion as well as gestu re and action recognition. These break-
throughs a re partially due to the adde d computatio nal com-
plexity and the storage footprint, which makes these mod-
els very hard to train as well as to deploy. For example,
the Alexnet [20] involves 61M flo ating point parameters
and 725M h igh precision multiply-accumulate operations
(MACs). Current DNNs are usually train ed offline by uti-
lizing specialize d hardware like NVIDIA GPUs and CPU
∗
The corresponding author.
clusters. But such an amount of computation may be unaf-
fordable for portable devices such as mobile phones, tablets
and wearable devices, which usually have limited comput-
ing reso urces. What’s mo re, the huge storage requirement
and large memory accesses may hinder efficient hardware
implementation of neural networks, like FPGAs and neural
network orien te d chips.
To speed-up test-phase computation of deep models, lots
of matrix and tensor factorization based methods are inves-
tigated by the community rec e ntly [5, 15, 32, 21, 18, 30].
However, these methods commonly utilize full-precision
weights, which are hardware-unfriendly especially for em-
bedded systems. Moreover, the low compression ratios hin-
der the applications of th ese methods on mobile devices.
Fixed-point quantization can partially alleviate these two
problems mentioned above. There have been many stud -
ies working on reducing the storage and the computational
complexity of DNNs by quantizing the parameters of these
models. Some of these works [3, 6, 8, 22, 24] quantize
the pretrained weights using several bits (usua lly 3∼12
bits) with a minimal loss of performance. However, in
these kinds of quantized networks one still needs to employ
large numbers of multiply-ac cumulate operations. Others
[23, 1, 4, 2, 17, 12, 25] focus on training th ese networks
from scratch with b inary (+1 and -1) or ternary (+1, 0 and -
1) weights. These methods do not rely on pretrained models
and may reduce the c omputations at training stage as well
as testing stage. But on the other han d, these methods could
not make use of the pretrained models very efficiently due
to the dramatic info rmation loss during the binary or ternary
quantization of weights.
In this paper, we prop ose a unified framework called
Fixed-point Factorized Network (FFN) to simultaneously
accelerate and compress DNN models with only minor per-
formance degradation. Specifically, we propose to first di-
rectly factorize the weight matrix usin g fixed-point (+1, 0
and -1) representation followed by recovering the (pseudo)
full precision subma trices. We also propose an effective and
practical technique called weight balancing, which makes