Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
Robust facial landmark tracking via cascade regression
Qingshan Liu, Jing Yang, Jiankang Deng, Kaihua Zhang
⁎
Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing, China
ARTICLE INFO
Keywords:
Face detection
Face alignment
Face tracking
Cascade regression
ABSTRACT
Recently, tremendous improvements have been achieved for facial landmark localization on static images.
However, detecting and tracking facial shapes in sequential images is still challenging due to the large
appearance variations in unconstrained videos. To address this issue, we present a robust facial landmark
tracking system via cascade regression, which is able to deal well with some challenges emerging in the
sequential images. Specially, our system employs a pose-based cascade shape regression model to predict the
facial landmark locations. Pose-based cascade shape regression model decreases the shape variances in the
model learning stage, making the learned regression model more robust to the large pose variances. In addition,
we explore a pose tracking model to enhance the temporal consecutiveness between the adjacent frames, and
leverage the Kalman filter to make the predicted shape more smooth and stable. Finally, we incorporate a re-
initialization mechanism with the facial landmarks as the position priors into the system, which is able to
effectively and accurately locate the face when it is misaligned or lost. Experiments on the LFPW, Helen, 300 W
and 300 VW datasets illustrate the superiority of proposed system over the state-of-the-art approaches, and it is
worthy emphasizing that our method has won the 300 VW competition in the category one.
1. Introduction
Facial landmark localization is among the most popular and well-
studied problems in the domain of computer vision [1] with a wide
range of applications, such as facial attribute analysis [2], face
verification [3–5], and face segmentation, tracking and recognition
[6–14], to name a few. To design a robust facial landmark localization
system is a great challenge due to extensive rigid and non-rigid face
variations, as along with unconstrained imaging conditions such as
illumination changes and occlusions in the real world conditions. In the
past two decades, numerous algorithms have been proposed [15,16] for
facial landmark localization, which can be roughly categorized into two
major categories: generative methods and discriminative methods.
Generative methods typically optimize the shape parameters itera-
tively with the purpose of best approximately reconstructing an input
image by a facial deformable model. Active Shape Models (ASMs) [17–
20] and Active Appearance Models (AAMs) [21–26] are two typical
representatives. In the ASMs, a global shape is constructed by applying
the Principal Component Analysis (PCA) method to the aligned
training set, and then the appearance is modeled partially with the
discriminatively learned templates. In the AAMs, the shape model
shares the same point distribution with that in the ASMs, while the
global appearance is modeled by PCA after removing shape variation in
the canonical coordinate frame.
Discriminative methods attempt to infer a face shape through a
discriminative regression function by directly mapping textual features
to the shapes. In [27], a cascaded regression method built on pose-
index feature has been introduced to pose estimation with excellent
performance. Cao et al. [28] integrate a two-level boosted regression
framework, shape-indexed features and a valid feature selection
method to make the regression more effective and efficient. Xiong
et al. [29] concatenate the SIFT features of each landmark as the
feature representation and obtain a regression matrix via linear
regression. In [30], a learning strategy is devised for a cascaded
regression approach by considering the structure of the problem.
Despite the demonstrated success of facial landmark localization in
the static images, less attention has been paid to facial landmark
tracking in the lengthy videos [31– 33] due to the challenging factors
such as expression, illumination, occlusion, pose, image quality en-
countered in unconstrained videos, and the lack of designed bench-
mark. Fortunately, the 300 VW challenge [34] has presented a new
comprehensive benchmark recently which covers faces in the uncon-
strained environments, under the various lighting conditions, in the
arbitrary expressions and possibly occluded by the other objects.
This paper is an extension of our previous work that was accepted
by ICCVW 2015 [35]. The main contributions of this paper are
following. In this paper, we construct a novel system based on cascade
regression for facial landmark tracking, and the main idea of which is
http://dx.doi.org/10.1016/j.patcog.2016.12.024
Received 15 July 2016; Received in revised form 22 December 2016; Accepted 22 December 2016
⁎
Corresponding author.
E-mail address: zhkhua@gmail.com (K. Zhang).
Pattern Recognition (xxxx) xxxx–xxxx
0031-3203/ © 2016 Elsevier Ltd. All rights reserved.
Please cite this article as: Liu, Q., Pattern Recognition (2016), http://dx.doi.org/10.1016/j.patcog.2016.12.024