Supervised feature learning via l
2
-norm regularized logistic regression
for 3D object recognition
Fuhao Zou
a
, Yunfei Wang
a,
n
, Yang Yang
b
, Ke Zhou
c
, Yunpeng Chen
a
, Jingkuan Song
d
a
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
b
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
c
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China
d
Department of Information Engineering and Computer Science, University of Trento, Trento 38100, Italy
article info
Article history:
Received 14 November 2013
Received in revised form
9 June 2014
Accepted 11 June 2014
Available online 23 October 2014
Keywords:
Logistic regression
Stochastic gradient ascent
3D object recognition
Feature learning
abstract
With the advance of 3D digitalization techniques, it has produced a large number of digital 3D objects,
which are usually present in graph, image or video format. In this paper, we focus on designing a novel
feature extraction method towards 2D image of 3D object for recognition task. Motivated by the fact that
the responses generated by a classifier for two objects can highly reflect their semantic similarity, we
attempt to exploit a set of classifiers to construct feature extraction method. The basic idea is as follows.
We first learn a classifier for each class and then combine the outputs of all classifiers as object feature.
Due to the label information being considered, the proposed method will be more powerful than the
typical methods, such as SIFT based bag-of-feature and sparse coding, in terms of discovering the latent
semantic information. This is helpful to improve the accuracy of the object recognition. In addition, to
make the proposed method scalable to be trained over the massive data (so as to better its generalization
ability), the ℓ
2
norm logistic regression is selected as the classifier and trained with stochastic gradient
ascent. At the aspect of time complexity, the proposed method is linear to the number of image pixels
and less expensive than the other two methods. These arguments have been demonstrated by
the obtained experimental results, which is performed over four 3D datasets, such as COIL-100, 3Ddata,
ETH-80 and RGB-D dataset.
& 2014 Elsevier B.V. All rights reserved.
1. Introduction
With the rapid development of 3D modeling as well as 3D
digital image/video capturing, we have witnessed the exponential
growth of 3D digital content, such as 3D graph and 3D image and
3D TV/movie [21,30]. Due to the fact that the 3D digital works are
able to bring us more vivid and lively vision experience than 2D
ones, the investigation related to 3D digital content has attracted a
lot of attention in the multimedia community, such as semantic
analysis [34,32], scene understanding retrieval [11,6] and recogni-
tion [33,7] for 3D objects. As is well known, the feature represen-
tation of 3D digital objects plays a fundamental role in the case of
multimedia analysis and understanding. Thus, it is highly worth-
while to conduct investigations of how to extract discriminant
features for 3D objects. For the purpose of simplifying the problem
to be discussed, we mainly concentrate on extracting features for
2D images of 3D objects here.
In principle, the features are roughly grouped into three classes:
low level features, middle level features and top level features.
Generally, the low level features are built on the low level information
of the 3D objects, i.e., the textur e information [2 7,19,12],shape[30,4],
color moments [25],Hu'smomentsinvariants[25] and so on. In
addition, according to whether or not the interested region of the
feature locally or globally corresponds to the image, the low level
features are also classified into local features and global features. Most
local feature s represent te xtur e in an image patch. For e xample, SIFT
features use histograms of gradient orientations [19] of the local
patch. Global features are composed of contour representations [28],
shape descriptors [4], and texture features [27].Totally,thelocalor
global features intend to capture the distinct features of 3D objects
and simultaneously resist the geometrical and photometrical distor -
tion such as tran slation, rotation, scale, occlusion, clutte r and illumi-
nation changes.
Though the local features offer the robustness virtues, they are
handcrafted and susceptible to suffer the “semantic gap” issue.
Namely, the low level feature cannot accurately match its top level
semantic information. This will result in the fact that the similar
objects are far apart in its low level features space with higher
probability, which will significantly degrade the performance
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/neucom
Neurocomputing
http://dx.doi.org/10.1016/j.neucom.2014.06.089
0925-2312/& 2014 Elsevier B.V. All rights reserved.
n
Corresponding author.
E-mail address: yunfeiwang@hust.edu.cn (Y. Wang).
Neurocomputing 151 (2015) 603–611