286 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 2, JUNE 2009
large intraclass variability in the pedestrian class. To deal with
this problem, we propose a tree-structured two-stage detector
based on Haar-like and HOG features to distinguish the objects
from nonpedestrian candidates. Gentle AdaBoost is used to se-
lect the critical features and learn the classifiers from the train-
ing images. The classifier based on Haar-like features is used
for rough classification, focusing on rejecting the nonpedestrian
candidates and selecting the well-bounded candidates. As the
size of the pedestrians varies in a wide range, three HOG-based
classifiers are trained on three separate sets containing images
of different size ranges to give precise classification. This way,
the classification complexity is reduced, and it helps to improve
system performance.
Although the object classification can achieve an FA rate as
low as 1%, there still exist some FAs flashing due to the huge
number of candidates in real-time processing. To suppress the
spurious detections and fill the detection gap between frames,
pedestrian tracking based on the Kalman filter and template
matching is adopted to filter and optimize the detection results.
The tracking algorithm relies on the Kalman filter to provide
a spatial estimation of the detected pedestrians, and the detec-
tion confidence in each frame is accumulated to determine the
detection certainty over time. For data association, the nearest
overlapped neighbor following the combined distance criterion
is selected as the observation. If the nearest-neighbor method
fails, template matching based on the appearance is used to
search for the possible observation. The tracking process is
divided into two stages: pretracking and tracking. Newly de-
tected objects enter the pretracking stage. Only after passing the
multiframe validation in the pretracking stage do they start to
be tracked as pedestrians by the system and be shown as output
alarms.
IV. ROI G
ENERATION
The ROI generation module, which tries to get regions that
potentially contain pedestrians, can be regarded as a rough
classifier operated on the entire original image. However, most
learning-based approaches are time consuming and, thus, un-
suitable, even though we adopt the most efficient AdaBoost
classifier based on simple Haar-like features. The hard real-
time constraint means that the rule-based methods are the only
choices.
Different rule-based methods can be applied to different
types of images. In the gray images captured at night by an
NIR or normal camera, the fact that pedestrians always appear
brighter than the surrounding background is usually utilized to
extract the ROIs through thresholding.
A. Image Segmentation
Thresholding is the common and simple way to divide a gray
image into foreground and background. Under uneven lighting
conditions, the popular solution is adaptive thresholding, where
different thresholds are used for different pixels or subregions
in the image [25].
Generally, the adaptive threshold for each pixel is individ-
ually calculated based on its local neighborhood [13], [21].
However, in cluttered scenes, the segmented object regions may
Fig. 3. Analysis of a typical pedestrian area. (a) Original image. (b) Topo-
graphic surface of (a). (c) and (d) The intensity values of the scan lines marked
with arrows.
connect with the bright background and split by the nonuniform
brightness of pedestrians. The false segmentation often makes
the classification fail and decreases the DR.
To take advantage of the low computation of the thresholding
method while cutting down the faults in segmentation, we
propose an adaptive dual-threshold segmentation algorithm to
efficiently segment the foreground. Unlike Tian et al. [13],
who calculate the thresholds on a square neighborhood, we
locally determine the two thresholds in horizontal scan lines
and optimize the parameters by experiments.
If the pedestrians appear brighter than the surrounding back-
ground, the situation will keep the same from the view of the
horizontal scan lines, even when the pedestrians have nonuni-
form brightness. Fig. 3 presents an example of pedestrians
with dark upper body and bright lower body. The two scan
lines show that the pixels from the pedestrian area are brighter
than the nearby background pixel on both sides of the person.
Obviously, this condition is easier to be satisfied than that
of common adaptive thresholding algorithms based on local
regions, where the pixels that belong to objects must be brighter
than the background in a large square neighborhood.
Meanwhile, calculating the thresholds from the scan lines has
another advantage that the algorithm is inclined to segment the
vertical bright regions of proper width, which not only helps
to break the connection to the background but also can prevent
segmenting the bright region of large horizontal size. Fig. 4(b)
and (c) gives a comparison of the results from [13] and (1),
where the thresholds calculated from the square neighborhood
produce a large bright region on the road that does not exist in
the result of the thresholds calculated from the scan lines.
However, because the brightness of the background and that
of the pedestrians vary in a wide range, the 1-D signals from the
scan lines are always contaminated by noises, and employing
a single threshold for each pixel may easily cause a failure, as
shown in Fig. 4(c). Thus, two thresholds are adopted to suppress