Efficient image classification via sparse
coding spatial pyramid matching
representation of SIFT-WCS-LTP feature
ISSN 1751-9659
Received on 24th November 2014
Revised on 14th July 2015
Accepted on 21st July 2015
doi: 10.1049/iet-ipr.2015.0329
www.ietdl.org
Mingming Huang, Zhichun Mu
✉
, Hui Zeng
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083,
People’s Republic of China
✉ E-mail: mu@ies.ustb.edu.cn
Abstract: Shape and texture information are critical to the accuracy of image classification systems. In this study, the
authors propose a novel descriptor called weighted centre-symmetric local ternary pattern (WCS-LTP), better
characterising the image local texture. Then, based on the proposed WCS-LTP descriptor, they introduce a new local
scale invariant feature transform and WCS-LTP (SIFT–WCS-LTP) feature extractio n approach. Compared with
conventional local CS-LTP and SIFT features, the authors’ proposed SIFT–WCS-LTP feature can not only capture the
shape information of images, but also tend to extract more precise texture information. Finally, SIFT–WCS-LTP fe ature-
based sparse coding spatial pyramid matching (ScSPM) representation classification is proposed for image
classification. Extensive experimental results demonstrate that the effectiveness of their proposed SIFT–WCS-LTP
feature-based ScSPM representation classification algorithm.
1 Introduction
Image classification, which annotates an image with one or multiple
labels corresponding to different semantic classes, is an important
research topic in the areas of computer vision, pattern recognition,
and machine learning. Moreover, image classification has attracted
an increasing amount of attention over the past few years, because
of its wide use in a broad range of applications such as human–
computer interaction [1], video surveillance [2] and robot path
planning [3].
Standard image classification pipelines use features (descriptors)
in combination with classifiers [4–6]. For good classification,
features should be descriptive and discriminative, and on the other
hand, invariant to different transformations and robust enough to
allow intra-class variation. In recent years, much effort has been
invested in developing features that yield good classification and
the focus in extracting features for classification has shifted from
global features describing the object as a whole, to local features.
Famous contributions include SIFT (scale invariant feature
transform) [7], (principal component analysis (PCA) and SIFT
(PCA–SIFT) [8], SURF (speeded-up robust features) [9] and so
on. Among them, the SIFT descriptor, proposed over a decade
ago, is currently among the best quality descriptors for image
classification. It relies on a three-dimensional histogram of
gradient locations and orientations where the contribution to bins
is weighted by the gradient magnitude and a Gaussian window
overlaid over the region. Inspired by the high discriminative power
and robustness of SIFT, many researchers have developed varieties
of local descriptors following the way of SIFT. The PCA–SIFT
descriptor is an extension of the SIFT descriptor, which applies
PCA to reduce the dimensionality of the SIFT descriptor vector
from 128 to 36. The SURF descriptor also relies on local gradient
histograms and speeds up the gradient computations using integral
images, while almost preserving the quality of SIFT.
To better take advantage of local features, the bag-of-visual-words
(BoV) model [10], which has been very popular, is used in image
classification. The BoV method represents an image as an
orderless collection of local features and its descriptive ability is
severely limited due to discarding the spatial information of
features. By overcoming this problem, one popular extension of
the BoV method, called the spatial pyramid matching (SPM) [11],
is proposed and has been shown to be effective for image
classification. The SPM partitions an image into several segments
in different scales, then computes the BoV histogram within each
segment and concatenates all the histograms to form a high
dimension vector representation of the image. For the purpose of
reducing the training complexity and improving the scalability,
sparse coding SPM (ScSPM) method [12] taking into account
some aspects of the spatial layout of the image is proposed, which
contribute to improving classification performance. Csurka et al.
[13] proposed BoV-based method for image classification. The
proposed method was based on BoV model, where a set of SIFT
features is first extracted and then an image is represented by the
BoV frequency histogram of SIFT features for image classification.
Wang et al. [14] developed a new method of image classification
by using the histogram of oriented gradient features which is
computed on a dense grid of uniformly spaced cells. In addition,
Akata et al. [15] applied PCA to reduce the dimensionality of the
SIFT descriptor from 128 to 64 for image classification.
In modern days, the images on the website or computers normally
contain complex background. Although local features have been
proven to be very effective in image classification, the accuracy of
classification is often limited by the presence of uninformative
local features that typically extracted from background [16]. The
SIFT feature is capable of capturing local object shape or edge
with the distributions of intensity gradients. For an image with
simple background, the SIFT feature is able to accurately represent
the foreground objects without noise interference [17]. However, it
will perform poorly when the image contains complex background
due to the fact that a portion of extracted features may come from
the noisy background. On the contrary, the CS-LTP
(centre-symmetric local ternary pattern) descriptor [18] capturing
the texture information of images does not take into account shape
information in images. Furthermore, it can filter out background
noise through local ternary patterns. Therefore, effective local
feature extraction approaches, which could capture shape and
texture information, are still needed to be investigated for image
classification.
This paper investigates an effective algorithm based on ScSPM
representation of scale invariant feature transform and WCS-LTP
(SIFT–WCS-LTP) feature for image classification. Our feature
extraction scheme is first to construct a novel descriptor called
IET Image Processing
Research Article
IET Image Process., 2016, Vol. 10, Iss. 1, pp. 61–67
61
&
The Institution of Engineering and Technology 2016