Laplacian regularized locality-constrained coding
for image classification
Huaqing Min
a
, Mingjie Liang
b,
n
, Ronghua Luo
b
, Jinhui Zhu
a
a
School of Software Engineering, South China University of Technology, Guangzhou 510006, China
b
School of Computer Science and Engineering, South China University of Technology, Guangzhou 5100 06, China
article info
Article history:
Received 20 January 2014
Received in revised form
10 March 2015
Accepted 29 July 2015
Communicated by Xiaoqin Zhang
Available online 7 August 2015
Keywords:
Image classification
Feature coding
Locality-constrained
Laplacian regularization
abstract
Feature coding, which encodes local features extracted from an image with a codebook and generates a
set of codes for efficient image representation, has shown very promising results in image classification.
Vector quantization is the most simple but widely used method for feature coding. However, it suffers
from large quantization errors and leads to dissimilar codes for similar features. To alleviate these
problems, we propose Laplacian Regularized Locality-constrained Coding (LapLLC), wherein a locality
constraint is used to favor nearby bases for encoding, and Laplacian regularization is integrated to
preserve the code consistency of similar features. By incorporating a set of template features, the
objective function used by LapLLC can be decomposed, and each feature is encoded by solving a linear
system. Additionally, k nearest neighbor technique is employed to construct a much smaller linear
system, so that fast approximated coding can be achieved. Therefore, LapLLC provides a novel way for
efficient feature coding. Our experiments on a variety of image classification tasks demonstrated the
effectiveness of this proposed approach.
& 2015 Elsevier B.V. All rights reserved.
1. Introduction
Classifying images into semantic categories, which is also
referred to as image classifi cation, is a problem of great interest
in both research and practice. On one hand, it is a very challenging
problem due to a number of factors involved in images, such as a
wide range of illumination conditions, tremendous changes in
view points, and large intra-class variation. On the other hand, it is
an essential issue in computer vision and image processing; the
techniques for solving image classification can be applied to a
large number of practical fields, including video tracking and
surveillance [1,2], content-based image indexing and retrieval
[3,4], and intelligent robot localization and navigation [5,6].
Potentials and challenges of image classification have attracted
lots of researchers’ attention these years.
One of the key issues for image classification is to find a
suitable way to represent images. Many image representation
models have been proposed, including the ones based only on
low-level features and the ones concerning semantic modeling [7].
The Bag-of-Words (BoWs) model [8] is one of the most popular
methods belonging to the latter category. In BoWs model, local
features are first extracted from an image, and quantized into
“visual words”, and then a histogram is formed by counting the
occurrence of visual words. Representing an image by a set of local
features has enabled BoWs model to obtain decent performance in
image classification despite changes in viewpoint, illumination
variation and partial occlusion. However, researchers also notice
several drawbacks of this model.
One evident drawback is the spatial information loss. BoWs
model considers an image as an orderless collection of features,
and discards the spatial relationship between them. This can
severely limit the descriptive power of the image representation.
To incorporate the spatial information, Lazebnik et al. [9] introduce
Spatial Pyramid Matching (SPM). Motivated by the work of
Grauman et al. [10], they partition the image into increasingly
finer spatial sub-regions and compute a histogram of local features
for each sub-regions. The histograms from all regions are then
concatenated to form a final representation of the image. Com-
pared to the original BoWs model, this technique has been shown
to be capable of improving the performance substantially. Plenty
of recent studies are built on the SPM framework, such as [11–14].
Another drawback is related to quantization errors [11,15].
Commonly, local features are converted to visual words by vector
quantization in the traditional BoWs model. Specifically, each local
feature will be assigned to an entry with the closest distance in the
Contents lists available at ScienceDirect
journal homepage: www.elsevier. com/locate/neucom
Neurocomputing
http://dx.doi.org/10.1016/j.neucom.2015.07.084
0925-2312/& 2015 Elsevier B.V. All rights reserved.
n
Corresponding author.
E-mail addresses: hqmin@scut.edu.cn (H. Min),
mjie.liang@gmail.com (M. Liang), rhluo@scut.edu.cn (R. Luo),
csjhzhu@scut.edu.cn (J. Zhu).
Neurocomputing 171 (2016) 1486–1495
Min
ie Lian