FACE-TLD: TRACKING-LEARNING-DETECTION APPLIED TO FACES
Zdenek Kalal
†
, Krystian Mikolajczyk
†
, Jiri Matas
‡
†
Centre for Vision, Speech and Signal Processing, University of Surrey, UK
‡
Center for Machine Perception, Czech Technical University, Czech Republic
ABSTRACT
A novel system for long-term tracking of a human face in
unconstrained videos is built on Tracking-Learning-Detection
(TLD) approach. The system extends TLD with the concept
of a generic detector and a validator which is designed for
real-time face tracking resistent to occlusions and appearance
changes. The off-line trained detector localizes frontal faces
and the online trained validator decides which faces corre-
spond to the tracked subject. Several strategies for build-
ing the validator during tracking are quantitatively evaluated.
The system is validated on a sitcom episode (23 min.) and a
surveillance (8 min.) video. In both cases the system detects-
tracks the face and automatically learns a multi-view model
from a single frontal example and an unlabeled video.
Index Terms— long-term face tracking, learning, detec-
tion, verification, real-time
1. INTRODUCTION
Long-term real-time tracking of human faces in uncon-
strained environments is a challenging problem: given a
single example of a specific face, track the face in a video
that may include frame cuts, sudden appearance changes,
long-lasting occlusions etc. In such environments, the frame-
by-frame tracking meets face detection and verification at
one point with a common goal to determine the location of
the specific face. This paper proposes a novel solution that is
suitable in such situations.
Two approaches are used for modeling an object ap-
pearance in tracking: static and adaptive. Static models [1]
assume that the object appearance change is limited and
known. Unexpected changes of the object appearance can not
be tracked. This drawback is addressed by adaptive meth-
ods [2] which update the object model during tracking. The
underlying assumption is that every update is correct. Every
incorrect update brings error to the model that accumulates
over time and causes drift. In the context of faces, the drift
problem has been addressed by introduction of so called
visual constraints [3]. Even though this approach demon-
strated increased robustness and accuracy, its performance
was tested only on videos where the face was in the field
of view. In scenarios where a face moves in and out of the
Fig. 1. Our system tracks, learns and detects a specific face in
real-time in unconstrained videos.
frame, face re-detection is essential. Face detection have been
extensively studied [4] and a range of ready-to-use face de-
tectors are available [5] which enable tracking-by-detection.
Apart from expensive offline training, the disadvantage of
tracking-by-detection is that all faces have the same model
and therefore the identities can not be distinguished. To
elevate this problem, Li et al. [6] proposed a face tracking
algorithm that splits the face model into three parts with dif-
ferent lifespan. This makes the tracker suitable for low-frame
rate videos but the longest period the face can disappear from
the camera view is limited. Another class of approaches for
face tracking was developed as part of automatic character
annotation in video [7]. These systems can handle the sce-
nario considered in this paper, but they have been designed
for offline processing and adaptation for real-time tracking is
not straightforward.
In this work, we build on an approach called Tracking-
Learning-Detection (TLD) [8], whose learning part was ana-
lyzed in [9]. The TLD method was designed for long-term
tracking of arbitrary objects in unconstrained environments.
The object was tracked and simultaneously learned in order
to build a detector that supports the tracker once it fails. The
detector was build upon the information from the first frame
as well as the information provided by the tracker. This paper
has three contributions w.r.t. TLD: (i) Additional source of
information (offline detector) is embedded to the TLD frame-
work which simplifies the learning task in cases when the ob-