Partitioning of CNN Models for Execution on Fog Devices
∗
Swarnava Dey
TCS Research and Innovation
Kolkata, West Bengal, India
swarnava.dey@tcs.com
Arijit Mukherjee
TCS Research and Innovation
Kolkata, West Bengal, India
mukherjee.arijit@tcs.com
Arpan Pal
TCS Research and Innovation
Kolkata, West Bengal, India
arpan.pal@tcs.com
Balamuralidhar P
TCS Research and Innovation
Bangalore, Karnataka, India
balamurali.p@tcs.com
ABSTRACT
Fog Computing has in recent times captured the imagination of
industrial and research organizations working on various aspects of
connected livelihood and governance of smart cities. Improvements
in deep neural networks imply extensive use of such models for
analytics and inferencing on large volume of data, including sensor
observations, images, speech. A growing need for such inferencing
to be run on devices closer to the data sources, i.e. devices which
reside at the edge of the network, popularly known as fog devices
exists, in order to reduce the upstream network trac. However,
being computationally constrained in nature, executing complex
deep inferencing models on such devices has been proved dicult.
This has led to several new approaches to partition/distribute the
computation and/or data over multiple fog devices. In this paper
we propose a novel depth-wise input partitioning scheme for CNN
models and experimentally prove that it achieves better perfor-
mance compared to row/column or grid based schemes.
CCS CONCEPTS
• Computing methodologies →
MapReduce algorithms;
• Com-
puter systems organization →
Cloud computing; Neural net-
works;
KEYWORDS
CNN, distributed, Edge, Fog, Cloud, DCNN, convolution, parallel
ACM Reference Format:
Swarnava Dey, Arijit Mukherjee, Arpan Pal, and Balamuralidhar P. 2018.
Partitioning of CNN Models for Execution on Fog Devices. In The 1st ACM
International Workshop on Smart Cities and Fog Computing (CitiFog’18),
November 4, 2018, Shenzhen, China. ACM, New York, NY, USA, 6 pages.
https://doi.org/10.1145/3277893.3277899
∗
Produces the permission block, and copyright information
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
CitiFog’18, November 4, 2018, Shenzhen, China
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6051-7/18/11.. . $15.00
https://doi.org/10.1145/3277893.3277899
1 INTRODUCTION
In recent years industries and research organizations have heav-
ily invested in Fog Computing where computational methods are
placed closer to the data sources at the edge of the network. Data
analytic applications processing large volume of sensor data, im-
ages, videos, sounds etc. to generate inferences are primary candi-
date applications for such a processing architecture as processing
the data closer to the source ensures less data trac upstream.
Example implementations of data analytic applications in Smart
City are available in smart city transport systems [
11
], smart city
healthcare [
22
,
23
], detection of illegal garbage dumping [
3
] and
several others. We redirect the reader to a recent survey [
14
] that
highlights challenges and opportunities in Articial Intelligence(AI)-
based frameworks for smart cities. It is noteworthy that many of the
above mentioned and several other data analytic applications for
smart city are adopting Deep Learning (DL)/Inferencing techniques
due to availability of state of the art (SoA) learning models ready
for transfer learning and ne tuning, resulting in faster time to
market. One of the major challenges of running top of the line deep
models like Inception, Resnet, VGG in common edge/fog devices
are the computational and memory requirements for each of the
models. In our experiments, we have found that the Inception V3
model [
28
] can not be loaded into the available memory without
allocating a USB based swap space in the Raspberry Pi 3 board and
it takes nearly ve seconds to classify a single image; and the issues
are similar in most of the commonly used models. In this work, we
propose a method to run deep inference operation of Convolutional
Neural Networks (CNN) [
16
] on a set of fog devices for achieving
high speed inferencing. CNNs are de facto techniques for image
classication and have recently been used for speech and sensor
data as well [
13
]. Though the concept of collaborative edge execu-
tion of CNN is introduced earlier by Mao et al. [
20
], our work ex-
tends the SoA through the following major contributions:1) a novel
depth-wise input partitioning scheme that removes the overhead
associated with earlier row/column and grid partitioning schemes,
2) a highlighted role of input and output depth of current convolu-
tional layers (CLs) in the speedup achieved by distributed execution
and 3) demonstration of its eect on distributed execution through
extensive simulations with realistic workloads.We also prove our
partitioning scheme on Inception V3 CLs on a real system based
on Raspberry pi 3 and TensorFlow [
8
] to achieve 3 times speedup.
The current paper is organized as follows: Section 2 gives a brief
overview of current state of development in Edge Computing and
19