3D Constrained Local Model for Rigid and Non-Rigid Facial Tracking
Tadas Baltru
ˇ
saitis Peter Robinson
University of Cambridge Computer Laboratory
15 JJ Thomson Avenue
tb346@cl.cam.ac.uk pr10@cl.cam.ac.uk
Louis-Philippe Morency
USC Institute for Creative Technologies
12015 Waterfront Drive
morency@ict.usc.edu
Abstract
We present 3D Constrained Local Model (CLM-Z) for
robust facial feature tracking under varying pose. Our ap-
proach integrates both depth and intensity information in
a common framework. We show the benefit of our CLM-
Z method in both accuracy and convergence rates over
regular CLM formulation through experiments on publicly
available datasets. Additionally, we demonstrate a way to
combine a rigid head pose tracker with CLM-Z that benefits
rigid head tracking. We show better performance than the
current state-of-the-art approaches in head pose tracking
with our extension of the generalised adaptive view-based
appearance model (GAVAM).
1. Introduction
Facial expression and head pose are rich sources of infor-
mation which provide an important communication chan-
nel for human interaction. Humans use them to reveal in-
tent, display affection, express emotion, and help regulate
turn-taking during conversation [1, 12]. Automated track-
ing and analysis of such visual cues would greatly bene-
fit human computer interaction [22, 31]. A crucial initial
step in many affect sensing, face recognition, and human
behaviour understanding systems is the estimation of head
pose and detection of certain facial feature points such as
eyebrows, corners of eyes, and lips. Tracking these points
of interest allows us to analyse their structure and motion,
and helps with registration for appearance based analysis.
This is an interesting and still an unsolved problem in com-
puter vision. Current approaches still struggle in person-
independent landmark detection and in the presence of large
pose and lighting variations.
There have been many attempts of varying success at
tackling this problem, one of the most promising being
the Constrained Local Model (CLM) proposed by Cristi-
nacce and Cootes [10], and various extensions that fol-
lowed [18, 23, 27]. Recent advances in CLM fitting and
response functions have shown good results in terms of ac-
Figure 1. Response maps of three patch experts: (A) face outline,
(B) nose ridge and (C) part of chin. Logistic regressor response
maps [23, 27] using intensity contain strong responses along the
edges, making it hard to find the actual feature position. By inte-
grating response maps from both intensity and depth images, our
CLM-Z approach mitigates the aperture problem.
curacy and convergence rates in the task of person indepen-
dent facial feature tracking. However, they still struggle in
under poor lighting conditions.
In this paper, we present a 3D Constrained Local Model
(CLM-Z) that takes full advantage of both depth and in-
tensity information to detect facial features in images and
track them across video sequences. The use of depth data
allows our approach to mitigate the effect of lighting con-
ditions. In addition, it allows us to reduce the effects of the
aperture problem (see Figure 1), which arises because of
patch response being strong along the edges but not across
them. An additional advantage of our method is the option
to use depth only CLM responses when no intensity signal
is available or lighting conditions are inadequate.
Furthermore, we propose a new tracking paradigm which
integrates rigid and non-rigid facial tracking. This paradigm
978-1-4673-1228-8/12/$31.00 ©2012 IEEE 2610