From Hashing to CNNs: Training
Binary Weight Networks via Hashing
Qinghao Hu, Peisong Wang, Jian Cheng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
University of Chinese Academy of Sciences, Beijing, China
Center for Excellence in Brain Science and Intelligence Technology, CAS, Beijing, China
{qinghao.hu, peisong.wang, jcheng}@nlpr.ia.ac.cn
Abstract
Deep convolutional neural networks (CNNs) have shown ap-
pealing performance on various computer vision tasks in re-
cent years. This motivates people to deploy CNNs to real-
world applications. However, most of state-of-art CNNs re-
quire large memory and computational resources, which hin-
ders the deployment on mobile devices. Recent studies show
that low-bit weight representation can reduce much storage
and memory demand, and also can achieve efficient network
inference. To achieve this goal, we propose a novel approach
named BWNH to train Binary Weight Networks via Hash-
ing. In this paper, we first reveal the strong connection be-
tween inner-product preserving hashing and binary weight
networks, and show that training binary weight networks can
be intrinsically regarded as a hashing problem. Based on this
perspective, we propose an alternating optimization method
to learn the hash codes instead of directly learning binary
weights. Extensive experiments on CIFAR10, CIFAR100 and
ImageNet demonstrate that our proposed BWNH outper-
forms current state-of-art by a large margin.
Introduction
Since Alexnet (Krizhevsky, Sutskever, and Hinton 2012)
made a success in ILSVRC2012 (Russakovsky et al. 2015),
deep convolutional neural networks have become more and
more popular. After that, various CNN models have been
proposed such as VGGNet (Simonyan and Zisserman 2014),
Inception (Szegedy et al. 2016), ResNet (He et al. 2016) and
so on. Nowadays, these CNN models have been playing an
important role in many computer vision areas (Krizhevsky,
Sutskever, and Hinton 2012; Ren et al. 2015; Long, Shel-
hamer, and Darrell 2015).
Attracted by the great performance of CNN models, many
people try to deploy CNNs to real world applications. Yet
the huge computational complexity and large parameter size
make CNN models hard to deploy on resource limited de-
vices such as mobile phones and embedded devices. The
huge computational complexity of CNN models makes the
inference phase very slow, which is unacceptable for many
real-time applications. The large parameter size brings three
difficulties. First, the large parameter size means that de-
ploying CNN models will consume huge disk storage. Sec-
ond, much run-time memory is required, which is limited in
Copyright
c
2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
many mobile devices. Third, large parameter size will cause
heavy DRAM access, which consumes more energy. Since
battery power is very limited in many mobile devices, this
severely affects devices’ battery life.
To alleviate these problems, a variety of methods have
been proposed to reduce the parameter size or acceler-
ate the inference phase. These methods can be divided
into three main categories: low-rank decomposition based
methods, pruning-based methods, and quantization based
methods. Low-rank decomposition based methods (Den-
ton et al. 2014; Jaderberg, Vedaldi, and Zisserman 2014;
Zhang et al. 2015; Wang and Cheng 2016) decompose a
weight matrix (tensor) into several small weight matrices
(tensors). These methods achieve good speed-ups for large
convolutional kernels, but usually perform poorly for small
kernels. Besides, the compression ratio of parameters is kind
of low by using low-rank based methods. Network prun-
ing has a long history and is still a widely used technique
for CNN compression and acceleration (Han et al. 2015;
Liu et al. 2015). The main idea of these methods is to
remove low-saliency parameters or small-weight connec-
tions (Han et al. 2015). In general, after the pruning step,
k-means clustering and Huffman coding are also required
to make a good compression ratio. The k-means cluster-
ing and Huffman coding bring inconvenience for inference
phase since we have to decode the Huffman codes and
use lookup table for k-means dictionary. As a result, the
decoding and lookup table will bring extra memory and
computational overhead. Quantization based methods in-
clude codebook based quantization methods and low-bit
weight representation. Codebook based quantization meth-
ods mainly use vector quantization algorithms such as K-
means, Product Quantization and so on (Gong et al. 2014;
Wu et al. 2016) to quantize the weight kernels. These meth-
ods require lookup tables to store the dictionary, and is un-
friendly to cache memory since accessing lookup tables is
random and unordered. Low-bit weight representation meth-
ods (Lin, Talathi, and Annapureddy 2015; Gupta et al. 2015;
Rastegari et al. 2016; Dong et al. 2017) represent weights
as low-bit fixed point even binary values. Low-bit weight
representation can reduce run-time memory and storage de-
mand as no decoding or lookup tables are required. As a
special case of low-bit weight representation, binary weight
can achieve about 32× compression ratio. In addition, since
The Thirty-Second AAAI Conference
on Artificial Intelligence (AAAI-18)
3247