H. LAW, Y. TANG, O. RUSSAKOVSKY, J. DENG: CORNERNET-LITE 3
achieves an AP of 34.4% on COCO at 30ms, simultaneously more accurate and faster than
YOLOv3 (33.0% at 39ms).
A natural question is whether CornerNet-Squeeze can be combined with saccades to im-
prove its efficiency even further. Somewhat surprisingly, our experiments give a negative
answer: CornerNet-Squeeze-Saccade turns out slower and less accurate than CornerNet-
Squeeze. This is because for saccades to help, the network needs to be able to generate suf-
ficiently accurate attention maps, but the ultra-compact architecture of CornerNet-Squeeze
does not have this extra capacity. In addition, the original CornerNet is applied at multiple
scales, which provides ample room for saccades to cut down on the number of pixels to pro-
cess. In contrast, CornerNet-Squeeze is already applied at a single scale due to the ultra-tight
inference budget, which provides much less room for saccades to save.
Significance and novelty: Collectively, these two variants of CornerNet-Lite make the
keypoint-based approach competitive, covering two popular use cases: CornerNet-Saccade
for offline processing, improving efficiency without sacrificing accuracy, and CornerNet-
Squeeze for real-time processing, improving accuracy without sacrificing efficiency.
Both variants of CornerNet-Lite are technically novel. CornerNet-Saccade is the first to
integrate saccades with keypoint-based object detection. Its key difference from prior work
lies in how each crop (of pixels or feature maps) is processed. Prior work that employs
saccade-like mechanisms either detects a single object per crop (e.g. Faster R-CNN [48])
or produces multiple detections per crop with a two-stage network involving additional sub-
crops (e.g. AutoFocus [38]). In contrast, CornerNet-Saccade produces multiple detections
per crop with a single-stage network.
CornerNet-Squeeze is the first to integrate SqueezeNet with the stacked hourglass archi-
tecture and to apply such a combination on object detection. Prior works that employ the
hourglass architecture have excelled at achieving competitive accuracy, but it was unclear
whether and how the hourglass architecture can be competitive in terms of efficiency. Our
design and results show that this is possible for the first time, particularly in the context of
object detection.
Contributions Our contributions are three-fold: (1) We propose CornerNet-Saccade and
CornerNet-Squeeze, two novel approaches to improving the efficiency of keypoint-based
object detection; (2) On COCO, we improve the efficiency of state-of-the-art keypoint based
detection by 6 fold and the AP from 42.2% to 43.2%, (3) On COCO, we improve both the
accuracy and efficiency of state-of-the art real-time object detection (to 34.4% at 30ms from
33.0% at 39ms of YOLOv3).
2 Related Work
Saccades in Object Detection. Saccades in human vision refers to a sequence of rapid eye
movements to fixate different image regions. In the context of object detection algorithms,
we use the term broadly to mean selectively cropping and processing image regions (sequen-
tially or in parallel, pixels or features) during inference.
There has been a long history of using saccades [12, 42, 63] in object detection to speed
up inference. For example, a special case of saccades is a cascade that repeatedly selects a
subset of regions for further processing, as exemplified by the Viola-Jones face detector [56].
The idea of saccades has taken diverse forms in various approaches, but can be roughly