LSOD: Local Sparse Orthogonal Descriptor for Image
Matching
Yiru Zhao, Yaoyi Li, Zhiwen Shao, Hongtao Lu
∗
Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering
Department of Computer Science and Engineering, Shanghai Jiao Tong University, P.R.China
{yiru.zhao, dsamuel, shaozhiwen, htlu}@sjtu.edu.cn
ABSTRACT
We propose a novel method for feature description used for
image matching in this paper. Our method is inspired by the
autoencoder, an artificial neural network designed for learn-
ing efficient codings. Sparse and orthogonal constraints are
imposed on the autoencoder and make it a highly discrimi-
native descriptor. It is shown that the proposed descriptor
is not only invariant to geometric and photometric transfor-
mations (such as viewpoint change, intensity change, noise,
image blur and JPEG compression), but also highly efficient.
We compare it with existing state-of-the-art descriptors on
standard benchmark datasets, the experimental results show
that our LSOD method yields better performance both in
accuracy and efficiency.
Keywords
Image matching; autoencoder; local feature descriptor
1. INTRODUCTION
Local feature descriptor is basal research of many com-
puter vision problems, such as image stitching [11], camera
calibration [19], object detection [14], and so on. SIFT key-
point detector and descriptor [12], which was proposed a
decade ago, has been proved effective in many image match-
ing scenarios [18, 20], but it imposes a large computational
cost, especially when used for real-time applications such
as simultaneous localization and mapping (SLAM) systems.
Many algorithms were proposed to improve SIFT in the fol-
lowing years, SURF [3] is one of them, which is faster but
less accurate than SIFT. DSP-SIFT [5] raises a modification
based on pooling gradient orientations. KAZE [1] introduces
a feature detection and description algorithm in nonlinear
scale spaces. It is accelerated in [2], by a descriptor called
AKAZE.
On the other hand, machine learning and neural network
are two of the rapidly growing fields in recent years and
∗
Corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
MM ’16, October 15-19, 2016, Amsterdam, Netherlands
c
2016 ACM. ISBN 978-1-4503-3603-1/16/10. . . $15.00
DOI: http://dx.doi.org/10.1145/2964284.2967217
X
(1)
X
(2)
X
(3)
X
(n-2)
X
(n-1)
X
(n)
...
...
X
(1)
'
X
(2)
'
X
(3)
'
X
(n-2)
'
X
(n-1)
'
X
(n)
'
...
Image Patch Sparse Orthogonal Autoencoder
LSOD
descriptor
Figure 1: Illustration of calculating the LSOD de-
scriptor for an image patch.
have achieved great success in many classical computer vi-
sion problems, such as image classification [9] and action
recognition [8]. Inspired by sparse autoencoder, one of the
well-known neural network models, we propose a new im-
age local feature descriptor. With the orthogonal features
learned from image dataset, autoencoder encodes an image
patch as the descriptor. Our method is called Local Sparse
Orthogonal Descriptor(LSOD), an example is shown in Fig-
ure 1. The main contributions of this paper include:
• Enhancing FAST detector with median filter scale pyra-
mid and intensity centroid.
• Proposing a method of training a sparse orthogonal
autoencoder used to describe the local image feature
patch.
2. RELATED WORK
Detector: The first step in image matching is detect-
ing interest points in the image and there have been many
productive interest point detectors. Harris corner detector
[7] gives a mathematical approach for determining whether
an image patch is flat, edge or corner. SIFT calculates his-
tograms of gray level gradient and chooses the peak orien-
tation as the main direction. SURF uses approximation of
block patterns, which is faster than computation of gradi-
ents. FAST and its extensions [16, 17] are good choices for
keypoints detecting in real-time systems. They are stable
and efficient to find corner keypoints, but sensitive to scale
variance. Therefore the FAST detector is often applied with
pyramid schemes for scale change.