Two-Eye Model-Based Gaze Estimation from A Kinect Sensor
Xiaolong Zhou
1
, Haibin Cai
2
, Youfu Li
3
, and Honghai Liu
2,4
Abstract— In this paper, we present an effective and accurate
gaze estimation method based on two-eye model of a subject
with the tolerance of free head movement from a Kinect sensor.
To accurately and efficiently determine the point of gaze, i) we
employ two-eye model to improve the estimation accuracy; ii)
we propose an improved convolution-based means of gradients
method to localize the iris center in 3D space; iii) we present a
new personal calibration method that only needs one calibration
point. The method approximates the visual axis as a line from
the iris center to the gaze point to determine the eyeball
centers and the Kappa angles. The final point of gaze can be
calculated by using the calibrated personal eye parameters. We
experimentally evaluate the proposed gaze estimation method
on eleven subjects. Experimental results demonstrate that our
gaze estimation method has an average estimation accuracy
around 1.99
◦
, which outperforms many leading methods in the
state-of-the-art.
I. INTRODUCTION
Gaze estimation is to determine the point of regard of
a person, which plays an important role in understanding
human attention, feelings, and desires. It has been widely
explored in many intelligent systems for virtual reality,
human-computer interaction, human-robot interaction, hu-
man behavior analysis and so on. Some gaze estimation
researchers concentrated on using the pupil center corneal re-
flection technique. This kind of technique normally requires
one or multiple infrared lights and high-quality cameras,
which limits the system’s potential for broader applications.
Moreover, most of the existing gaze estimation systems have
low tolerance toward head movement, which hinders them
from being widely used.
Recently, Kinect-based 3D gaze estimation [1], [2], [3],
[4], [5], [6], [7], [8] has attracted increasing attention since
it is low-cost, non-intrusive, simple-setup and it allows free
head movements. Generally, Kinect-based gaze estimation
methods can be roughly classified into non-eye model-based
methods and eye model-based methods. Non-eye model-
based methods are typically appearance-based or regression-
based. For example, Mora and Odobez [1] estimated 3D gaze
This work was supported in part by the National Natural Science Foun-
dation of China (61403342, 61673329, U1509207, 61325019, 51575338)
and Hubei Key Laboratory of Intelligent Vision Based Monitoring for
Hydroelectric Engineering (2014KLA09).
1
Xiaolong Zhou is with the College of Computer Science and
Technology, Zhejiang University of Technology, Hangzhou, China.
zxl@zjut.edu.cn
2
Haibin Cai and Honghai Liu are with the School of Computing,
University of Portsmouth, Portsmouth, UK.
3
Youfu Li is with the Department of Mechanical and Biomedical Engi-
neering, City University of Hong Kong, Hong Kong, China.
4
Honghai Liu is with the State Key Laboratory of Mechanical Systems
and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong
University, Shanghai, China.
from multimodal Kinect data and achieved an estimation ac-
curacy with average error around 7.6
◦
−12.6
◦
. Furthermore,
they proposed a geometric generative 3D gaze estimation
method [2] based on an appearance generative process that
modeled head-pose rectified eye images recovered by using
of RGB-D cameras, which improved the estimation accuracy
to 6.3
◦
. Cazzato et al. [3] incorporated the 3D head pose to
estimate the final gaze direction according to the geomet-
ric relations among the sensor, observer and target. They
reported the estimation errors for unaware users with 6.9
◦
while for informed users with 3.6
◦
. The main benefit of non-
eye model-based methods are specific personal calibration
free. However, the estimation accuracy of this kind of method
is low (generally above 6
◦
).
Different from the non-eye model-based methods that
estimate the gaze using appearance or regression technique,
3D eye model-based methods directly determine the gaze
using the geometric relationship among human eyes, sensors
and gazing points. For example, J. Li and S. Li [4] proposed
an eye-model-based 3D gaze estimation method from a
Kinect sensor. They built a head model based on the Kinect
sensor and calibrated the eyeball center by gazing at a target
in 3D space. The gaze direction was estimated after the
calibration and the reported average error of estimation was
around 6
◦
. Recently, they estimated the gaze from color
image based on an eye model with known head pose [5].
They first determined the 3D eyeball center in calibration
manner by gazing at the center of the color image camera,
and then estimated the 3D iris center using the information
of its contour and projection. They reported the average
estimation errors for seven subjects with 5.9
◦
vertically and
4.4
◦
horizontally. Sun et al. [6] estimated the gaze direction
based on a 3D geometric eye model by considering the
head movement and deviation of the visual axis from the
optical axis. They reported a high estimation accuracy of
1.4
◦
-2.7
◦
. However, the proposed method involved many
calibration procedures like screen-camera calibration and
personal calibration with multiple calibration points.
Although eye model-based gaze estimation methods can
achieve a higher accuracy (below 6
◦
), this kind of method
normally require specific personal calibration, which in-
volves human interactions. Moreover, the estimation ac-
curacy greatly relies on the number of calibration points.
Generally, more calibration points will lead to higher esti-
mation accuracy while at the same time require more human
interactions.
Besides the personal calibration, the 3D location of hu-
man’s iris is another key technique that affects the final gaze
estimation accuracy. Currently, a large number of iris center
2017 IEEE International Conference on Robotics and Automation (ICRA)
Singapore, May 29 - June 3, 2017
978-1-5090-4632-4/17/$31.00 ©2017 IEEE 1646