
Vision-based ACC with a Single Camera: Bounds on Range and
Range Rate Accuracy
Gideon P. Stein Ofer Mano Amnon Shashua
MobileEye Vision Technologies Ltd. MobileEye Vision Technologies Ltd. Hebrew University
Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel
gideon.stein@mobileye.com ofer.mano@mobileye.com shashua@cs.huji.ac.il
Abstract
This paper describes a Vision-based Adaptive Cruise Con-
trol (ACC) system which uses a single camera as input. In
particular we discuss how to compute range and range-rate
from a single camera and discuss how the imaging geome-
try affects the range and range rate accuracy. We determine
the bound on the accuracy given a particular configuration.
These bounds in turn determine what steps must be made
to achieve good performance. The system has been imple-
mented on a test vehicle and driven on various highways
over thousands of miles.
1 Introduction
The Adaptive Cruise Control (ACC) application is the most
basic system in the evolutionary line of features where sen-
sors in the vehicle assist the driver to increase driving safety
and convenience. The ACC is a longitudinal distance con-
trol designed to find targets (other vehicles), determine their
path position (primary target determination), measure range
and range-rate to the primary target vehicle and perform ap-
propriate brakes and throttle actuation to maintain safe dis-
tance to the primary target and to resume the preset cruising
speed when no such targets are detected by the system. The
basic ACC feature is offered today (as a convenience fea-
ture) in serial production models by an increasing number
of car manufacturers.
The underlying range measurement technology of existing
systems falls into the category we call “direct range” sen-
sors which include millimeter wave radars (77GHZ radars
mostly)[1], Laser Radars (LIDAR) and Stereo Imaging (in-
troduced in Japan on the Subaru Legacy Lancaster[2]).
These sensors provide an explicit range measurement per
feature point in the scene. The range map provides strong
cues for segmenting the target from the background scene
and, more importantly to this paper, explicit range is then
being used for distance control.
In this paper we investigate the possibility of performing
distance control, to an accuracy level sufficient for a se-
rial production ACC product, using a monocular imaging
device (a single video camera) which provides only “indi-
rect range” using the laws of perspective (to be described
below). This investigation is motivated by two sources:
first is biological vision and second is practical. In the
human visual system the stereo base-line is designed for
hand-reaching distances and for very rough approximate
range measurements at farther distances. Distance control
in an ACC application requires range measurements of dis-
tances reaching 100m where a human observer cannot pos-
sibly make accurate absolute range estimations at that range.
Moreover, many people suffer from stereo deficiency with-
out any noticeable effect on the daily visual navigation (and
driving) abilities. On the other hand, based on retinal di-
vergence (scale change of the target) the human visual sys-
tem can make very accurate “time to contact” assessments.
Therefore, the question that arises in this context is what
are the necessary measurement accuracies required for a
distance control? clearly, the accuracies of range provided
by Radar and LIDAR are sufficient for distance control, but
the example of human vision indicate that perhaps one can
achieve satisfactory actuation control using only the laws of
perspective. The second motivation is practical and is borne
out of the desire to introduce low-cost solutions for the ACC
application. A stereo design not only includes the cost of the
additional camera and processing power for dense disparity
but also the problem of maintaining calibration of the sys-
tem (relative coordinate frames between the two cameras) is
somewhat challenging for a serial production product[3, 4].
A monocular visual processing system would be easier to
mass produce and would cost less as an end product.
The challenges of a monocular visual system are twofold.
On the one hand, the system lacks the depth cues
1
used
for target segmentation and instead pattern recognition tech-
niques should be heavily relied on to compensate for the
lack of depth. The question that arises there is whether pat-
tern recognition can be sufficiently robust to meet the strin-
gent detection accuracy requirements for a serial production
product? On the other hand, and this is the focus of this pa-
per, once the target is detected can the laws of perspective
and retinal divergence meet the required accuracies for ac-
tuation control?
1
At short distances one can rely on some weak motion parallax mea-
surements but those are not available at ranges beyond 20-30m.