HUMAN HAND DETECTION USING ROBUST LOCAL DESCRIPTORS
Jianwei Niu
1
, Xiaoke Zhao
1
, Muhammad Ali Abdul Aziz
1
, Jiangwei Li
2
, Kongqiao Wang
2
, Aimin Hao
1
1
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
2
Nokia Research Center, Beijing, China
E-mail: niujianwei@buaa.edu.cn, zhaoke001@126.com, xerox414@hotmail.com,
{jiangwei.li, kongqiao.wang}@nokia.com, ham@buaa.edu.cn
ABSTRACT
To date, human hand detection in images remains a chal-
lenging task due to the variable lighting conditions, hand ap-
pearances and background noise. In this paper, we present
an effective strategy based on feature fusion for detecting
hands with cluttered surroundings. To form the fusions,
we propose three novel noise invariant features, namely: 1)
NCHOG (Noise Compensated Histogram of Oriented Gra-
dients), 2) NCLBP (Noise Compensated Local Binary Pat-
terns), and 3) HPCP (Histograms of Pairs of Circumference
Pixels). We show the superior performance of the NCHOG
and the NCLBP descriptors over their existing traditional
counterparts, i.e., HOG and LBP. Merging our novel features
with existing features in different permutations, and applying
Partial Least Squares (PLS) based feature weighting, yields
excellent detection results on our own dataset of hand images
with variegated and complex backgrounds.
Index Terms— Hand detection, NCHOG, NCLBP,
HPCP, PLS
1. INTRODUCTION
In the recent years, a number of approaches for hand detection
have been presented. The authors in [1] tackle hand posture
recognition with some degree of success by using Haar-like
features. Nonetheless, their dataset consists of only images
with very simple backgrounds. Skin color segmentation has
been utilized by several approaches like [2, 3]. However,
these methods are sensitive to quickly changing or mixed
lighting conditions. Kolsch and Turk [4] use fanned boost-
ing detection for classification and get nearly real time results.
The major drawback of the technique is the constraints on the
resolution and aspect ratio of gesture template.
More recently, the idea of combining different features
into a larger feature set has been proposed in areas like object
detection, human detection and face detection. [5] employs a
combination of HOG [6] (Histogram of Oriented Gradients),
LBP [7] (Local Binary Patterns, here LBP is Color LBP) and
LTP [8] (Local Trinary Patterns) descriptors for object detec-
tion. In [9], the authors use HOG, CF (Color Frequency) and
texture cooccurrence features for human detection. Moreover,
works like [10, 11] detect humans accurately by using a mix-
ture of features. However, the effectiveness of the feature
fusion technique for the task of hand detection still remains
unexplored. In this paper, we aim to assess the suitability of
using fusions of heterogeneous and complementary features,
for hand detection. Our motivation for this is that having a re-
liable hand detector can facilitate many other tasks in human
temporal analysis. We use HOG and our proposed feature
NCHOG (Noise Compensated HOG) to encapsulate the ro-
bust edges of the hand. Then, to capture the distinct texture
of the hand, we make use of the CLBP (Color LBP), LTP and
our proposed descriptors: the CNCLBP (Color Noise Com-
pensated LBP) and the HPCP (Histograms of Pairs of Cir-
cumference Pixels). Finally, the color information is further
encoded using the CF feature.
We make the following major contributions in this paper:
1) We propose three new noise invariant features: NCHOG,
NCLBP and HPCP. The last feature is a histogram based vari-
ant of the CCS-POP (Circular Center Symmetric-Pairs of Pix-
els) feature presented in [12]. We prove that NCHOG is more
discriminative than HOG and similarly NCLBP is better than
LBP; 2) Based on our experiments, we find that the feature
set incorporating NCHOG, CNCLBP, LTP and CF, exhibits
better performance than all other fusions in our feature fam-
ily, including the feature set HOG + CLBP + LTP proposed
by Hussain and Triggs [5].
The rest of the paper is structured as follows: Firstly,
Section 2 describes each of our proposed features (i.e.,
NCHOG, NCLBP, HPCP), dimensionality reduction tech-
nique and classifier used. Then, in Section 3, feature real-
ization details are presented, and we discuss the results of the
experiments performed using individual features as well as
feature sets. Finally, conclusions are drawn in Section 4.
2. PROPOSED METHOD
2.1. Features
Selection of good visual features is crucial for reliable hand
detection as the hand postures are very rich in shape variation