Contributions
Our human perception method combines a set of novel
techniques to create a system that is capab le of trac ki ng
multiple human targets and rejecting nonhuman objects
from a mobile robot. What’s more, our perception system
is robust to occlusion, illustration changes, and unpre-
dicted motion patterns. The main contributions of this
article include:
1. the introduction of a new idea using meanshift clus-
tering candidate se gmentation for plan view map
generation from a RGB-D camera, which allows
us to avoid using the noisy point cloud for compu-
tationally expensive plan view map generation, aug-
ment detection precision, and speed up to achieve
real-time performance;
2. the use of point cloud preprocessing, where planes,
cylinders, or other regular objects are removed to
lower the false positive ratio, followed by tracking-
by-detection over a 3D point cloud that associates
motion tracking and object detection which can
extensively be applied to HRI.
The remainder of the article is organized as follows. The
second section overviews the related literature of human
perception. The third section depicts our approaches to
detect and track multiple humans in preprocessed 3D point
clouds. Experimental results are presented in the fourth
section. Finally, the conclusion of this article is given in
the fifth section.
Related work
To achieve natural human perception in crowded human
zones, a large number of human detection and tracking
approaches have been investigated. Using a consumer-
grade camera is cost-efficient, so it is widely adopted in
human detection and tracking. To detect and track people in
the real world from a moving camera, great efforts have
been made. A probabilistic framework was proposed
5
to
detect multiple people in a crowded scene by combining
multiple detectors. By combining multiple detectors, the
Reversible Jump Markov Chain Monte Carlo particle filter-
ingmethodwasadoptedtofindmaximumaposteriori
probability (MAP) of a posterior probability to track people
in a single coherent framework. Mekonnen et al.
18
designed
a cooperative perception system made up of wall mounted
cameras and a mobile r obot t o perceive passers- by and
obtain their positions and trajectories. Jia et al.
19
presented
a visual human tra cking approach based on a meanshift
algorithm. In their implementation, color and texture histo-
grams were integrated into a meanshift tracker under the
double-layer locating mechanism. The Histogram of
Oriented Gradient (HOG),
4
also known as the Dalal–Triggs
detector, was introduced to localize people utilizing a slid-
ing window and support vector machines (SVM) to discri-
minate people from others. A drawback of using a single
camera is that occlusion causes a false negative.
What’s more, a legTracker
20
was proposed to detect and
track human legs by the application of the support vector
data description scheme using measurement from a laser
range finder. In addition, networks of laser range finders
were calibrated to determine the positions of pedestrians,
which enabled pedestrian tracking within 11 cm accu-
racy.
21
But these laser range finder based human detection
and tracking systems provide only partial depth informa-
tion about a single plane.
3D sensors, such as a 3D-laser, 3D rotating Lidar,
stereo ca mera, ToF camer a, and RGB-D camera, can pro-
vide 3 D position i nforma tion and spatial geometric con-
straints of a human. W ith the assistance o f such 3D spatial
information, the r obot knows how people move about in
the surrounding environment. Depth sensing technology
assisted human dete ction and tracking systems have also
been extensively discuss ed.
1. 3D-lasers. Spinello et al.
11
proposed a novel
approach for pedestrian detection in a 3D range data
Figure 1. Overview of our multiple human detection and tracking system. Starting with the input Point Cloud Data (PCD) point cloud,
the system: (1) detects the ground and ceiling planes and removes them; meanwhile, a prior-knowledge guided random sample
consensus (RANSAC) is used to fit the ground plane; (2) projects all points onto the ground plane, and applies a meanshift clustering
algorithm to segment candidates for generating plan view maps; (3) associates motion and detection data for multiple human object
tracking. Our tracking results are demonstrated using a bounding box in which a human is tracked.
Liu et al. 3