DEEP FEATURE COMPRESSION FOR COLLABORATIVE OBJECT DETECTION
Hyomin Choi and Ivan V. Baji
´
c
School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada
ABSTRACT
Recent studies have shown that the efficiency of deep neu-
ral networks in mobile applications can be significantly im-
proved by distributing the computational workload between
the mobile device and the cloud. This paradigm, termed col-
laborative intelligence, involves communicating feature data
between the mobile and the cloud. The efficiency of such
approach can be further improved by lossy compression of
feature data, which has not been examined to date. In this
work we focus on collaborative object detection and study the
impact of both near-lossless and lossy compression of feature
data on its accuracy. We also propose a strategy for improving
the accuracy under lossy feature compression. Experiments
indicate that using this strategy, the communication overhead
can be reduced by up to 70% without sacrificing accuracy.
Index Terms— Deep feature compression, collaborative
intelligence, compression-augmentation, object detection
1. INTRODUCTION
Mobile and Internet-of-Things (IoT) [1] devices are increas-
ingly relying on Artificial Intelligence (AI) engines to en-
able sophisticated applications such as personal digital as-
sistants [2], self-driving vehicles, autonomous drones, smart
cities, and so on. The AI engines themselves are generally
built on deep learning models. The most common way of de-
ploying such models is to place them in the cloud and have
the sensor data (images, speech, etc.) uploaded from the mo-
bile to the cloud for processing. This is referred to as the
cloud-only approach. More recently, with smaller graphical
processing units (GPUs) making their way into mobile/IoT
devices, some deep models might be able to run on the mo-
bile device, an approach referred to as mobile-only.
A recent study [3] has examined a spectrum of possibil-
ities in between the cloud-only and mobile-only extremes.
Specifically, they considered splitting a deep network into
two parts: the front end (consisting of an input layer and a
number of subsequent layers), which runs on the mobile, and
the back end (consisting of the remaining layers), which runs
on the cloud. In this approach, termed collaborative intelli-
gence, the front end computes features up to some layer in
the network, then these features are uploaded to the cloud
for the remainder of the computation. The authors examined
the energy consumption and latency associated with perform-
ing computation in this way, for various split points in typical
deep models. Their findings indicate that significant savings
can be achieved in both energy and latency if the network is
split appropriately. They also proposed an algorithm called
Neurosurgeon to find the optimal split point, depending on
whether energy or latency is to be minimized.
The reason why collaborative intelligence can be more ef-
ficient than cloud-only and mobile-only approaches is that the
feature data volume in deep convolutional neural networks
(CNNs) typically decreases as we move from the input to the
output. Executing initial layers on the mobile will cost some
energy and time, but if the network is split appropriately, we
will end up with far less data to be uploaded to the cloud,
which will save both transmission latency on the uplink and
the energy used for radio transmission. Hence, on the balance,
there may be a net benefit in energy and/or latency. Based
on [3], depending on the resources available (GPU or CPU on
the mobile, speed and energy for wireless transmission, etc.),
optimal split points for CNNs tend to be deep in the network.
A recently released study [4] has extended the approach of
[3] to include model training and additional network architec-
tures. While the network is again split between the mobile
and the cloud, in the framework proposed in [4] the data can
move both ways between the mobile and the cloud in order to
optimize efficiency of both training and inference.
While [3, 4] have established the potential benefits of col-
laborative intelligence, the issue of efficient transfer of feature
data between the mobile and the cloud is largely unexplored.
Specifically, [3] does not consider feature compression at all,
while [4] uses 8-bit quantization of feature data followed by
lossless compression, but does not examine the impact of such
processing on the application. Feature compression can fur-
ther improve the efficiency of collaborative intelligence by
minimizing the latency and energy of feature data transfer.
The impact of compressing the input has been studied in sev-
eral CNN applications [5, 6, 7] and the effects vary from case
to case. However, the impact of feature compression has not
been studied yet, to our knowledge.
In this work, we focus on a deep model for object de-
tection and study the impact of feature compression on its
accuracy. Section 2 presents preliminaries, while Section 3
describes the proposed methods. Experimental results and
conclusions are presented in Sections 4 and 5, respectively.
arXiv:1802.03931v1 [cs.CV] 12 Feb 2018