Tracking Human Motion Using Multiple Cameras
Q. Cai and J. K. Aggarwal
Computer and Vision Research Center
Department of Electrical and Computer Engineering
The University of Texas at Austin
email: aggarwaljk@mail.utexas.edu
Abstract
This paper presents a framework for tracking human mo-
tion in an indoor environment from sequences of monoc-
ular grayscale images obtained from multiple fixed cam-
eras. Multivariate Gaussian models are applied to find the
most likely matches of human subjects between consecutive
frames taken by cameras mounted in various locations. Ex-
perimental results from real data show the robustness of the
algorithm and its potential for real time applications.
1. Introduction
Tracking human motion in an indoor environment is of
interest in applications of surveillance. In particular, we
are developing a methodology to track individuals at sites
such as corridors, airports, borders, and secured buildings.
This requires that the viewing system be able to image the
tracked subject in a broad area over a long period of time.
In pursuit of this goal, our work has evolved from study-
ing human walking using a fixed camera [1, 2] to tracking
non-backgroundobjects in a single moving camera [3]. The
studies in tracking using a fixed single camera [4, 2, 5] are
limited to a very narrow area due to the restricted viewing
angle of the system. A moving camera with a substantial
degree of rotational freedom [3] increases the viewing angle
to certain degree, however, it complicates the implementa-
tion by adding the motion estimation of both the viewing
system and the subject of interest, and is still limited in
the amount of viewing area. In this work, we chose to use
multiple fixed cameras mounted in the area of interest to
track and monitor the motion of individuals in sequences
of monocular grayscale images. As long as the subject is
within the area monitored by the fixed cameras, the image
of this subject will be contained in the view of at least one
The research reported in this paper was supported in part by the Army
Research Office, Contract DAAH-94-G-0417 and Texas Advanced Tech-
nology Program, Contract ATP-442.
camera. Based on this scenario, the problem of monitoring
a subject becomes that of tracking the subject of interest in
one camera view and matching that subject across different
camera views, where the cameras’ intrinsic parameters and
relative positions are assumed to be known a priori.
To establish correspondencebetween consecutiveframes
from different cameras, conventional tracking methods
based on the similarity of the object shape, such as cross-
correlation and line-edge matching, are not applicable be-
cause the shape of an object image varies drastically from
view to view of different cameras, and the whole body of a
moving human usually goes through complicated changes
during motion. Deformable template tracking for non-rigid
objects does not fit either because the contours of the human
shape are not always complete in cluttered indoor scenes. In
addition, the continuity of the motion flow does not retain
in the views of multiple cameras. Optical flow methods [6],
which are widely used for featureless motion tracking, de-
mandsmall and smoothmotion between frames,arestriction
that also does not hold in our case. In this paper,we propose
to track a moving human in different camera views based on
low level recognition of human motion [5]. A simpler form
of a 2D human model [7] is applied to detect movinghuman
subjects. Tracking between consecutive frames is mainly
based on the consistency of the position, velocity, and av-
erage intensity of feature points formulated by multivariate
Gaussian models, considered in the views of various cam-
eras. The proposed algorithm is computationally efficient
and can be readily used in real time applications.
2. Pre-processing
Three stages of pre-processing are performed before
tracking begins: 1) segmentation of the non-background
objects from the still background, 2) detection of human
subjects from the segmented non-background objects, and
3) feature extraction from the segmented human subjects.
The quality of object segmentation plays a critical role in
later processing. If an non-background object is missed at