Squeezed Edge YOLO: Onboard Object Detection on
Edge Devices
Edward Humes
1
Mozhgan Navardi
2
Tinoosh Mohsenin
2
1
University of Maryland, Baltimore County
2
Johns Hopkins University
ehumes2@umbc.edu {mnavard1, tinoosh}@jhu.edu
Abstract
Demand for efficient onboard object detection is increasing due to its key role
in autonomous navigation. However, deploying object detection models such as
YOLO on resource constrained edge devices is challenging due to the high com-
putational requirements of such models. In this paper, an compressed object detec-
tion model named Squeezed Edge YOLO is examined. This model is compressed
and optimized to kilobytes of parameters in order to fit onboard such edge devices.
To evaluate Squeezed Edge YOLO, two use cases - human and shape detection -
are used to show the model accuracy and performance. Moreover, the model is
deployed onboard a GAP8 processor with 8 RISC-V cores and an NVIDIA Jetson
Nano with 4GB of memory. Experimental results show Squeezed Edge YOLO
model size is optimized by a factor of 8x which leads to 76% improvements in
energy efficiency and 3.3x faster throughout.
1 Introduction
Interest in Machine Learning (ML) is dramatically increasing as it provides a promising solution for
various applications such as autonomous navigation [1, 2]. Object detection models in particular can
significantly assist in autonomous navigation by detecting obstacles and pre-defined objects of inter-
est in the environment [3]. However, object detectors have high computational requirements due to
the need for accuracy and the ability to detect various object categories. GPUs with significant com-
putational capacity are often mandatory to train such complex models, yet onboard processing and
edge computing necessitates low-power and low-computation algorithms as a result of the limited
power and computational capacity available [4].
Object detectors are trained classifiers that can identify and locate multiple objects within an image.
These detectors are trained on a set of annotated images, and their accuracy is evaluated on unseen
datasets. There are two commonly used object detector paradigms: single-shot and two-shot. Single-
shot-based methods such as You Only Look Once (YOLO) [5], Single Shot Detector (SSD) [6], etc.,
directly predict the class probabilities and Bounding Box (BBox) coordinates for objects in an im-
age. In contrast, two-shot architectures such as R-CNN [7], Faster R-CNN [8], etc., generate a set
of region proposals and then classify and refine them to output the final object detection. Moreover,
two-shot object detection methods have several advantages over other methods, including robustness
to scale and size variations, accurate localization, flexibility, and improved object recognition [9].
However, these advantages come at the expense of inference speed, with single-shot object detectors
generally being faster than two-shot object detectors. Despite this, even single-shot objector models
are difficult to deploy to resource constrained edge devices due to their high computational com-
plexity. Therefore, it is important to improve object detection models to meet power consumption
and real-time requirements on such devices [10, 11].
In recent years, researchers have presented optimized object detection models [10, 12, 13, 14, 15,
16, 17, 18] to enable onboard object detection on edge devices. Work in [13, 14, 15] proposed an
37th First Workshop on Machine Learning with New Compute Paradigms at NeurIPS 2023(MLNPCP 2023).
arXiv:2312.11716v1 [cs.CV] 18 Dec 2023