978-1-4799-0333-7/13/$31.00 © 2013 IEEE 353
Survey of Local Invariant Feature Description
Wei Huang, Yingmei Wei, Yuxiang Xie
School of Information System & Management
National University of Defense Technology
Changsha, China
Hongwei Jin
Academy of National Defense Information
Wuhan, China
Abstract—Local image feature description is a basic research
in the field of computer vision and it is also a hotspot in the
community. The paper depicts the development history of local
feature description in decades. Then based on the strategy of
feature pooling, it classifies feature description methods into
three types: histogram based method, feature comparison based
method and machine learning based method. It gives a
comprehensive overview of the common methods in each class
and compares them in terms of computational complexity,
storage requirements and descriptor performance. Overall,
histogram-based methods have the best performance in a variety
of image distortions; Feature comparison based methods have the
highest computational efficiency. Machine learning based
methods require less storage space. At last, the challenges and
future development of local feature description has been
discussed.
Keywords—local feature description; SIFT; LBP; random
projection; hash; LDA
I. INTRODUCTION
In last decade, local feature description has been a hotspot
in the community of computer vision. As a basic research in
this field, it has been widely utilized in a number of
applications such as image retrieval [1, 2], image classification
[3, 4], specific target recognition [5, 6], wide baseline
registration [7, 8], 3D reconstruction [9, 10] and auto
navigation [11, 12].
The history of local invariant features can be traced back to
Moravec’s corner detector. The scale-space theory [13], which
was proposed by Lindeberg in the 1990s has laid the theoretical
foundation of local invariant features. In 1999, Lowe proposed
the scale invariant feature transform (SIFT) [14] which can be
seen as a milestone in the research process of local invariant
features. Since then, a large number of local feature descriptors
have been proposed. From the development of local invariant
features, we can see that the invariance (scale invariance,
rotational invariance, gray invariance) of the feature is the crux
of the matter compared to locality. By avoiding the image
segmentation in the semantic level, the local invariant feature
provides a statistical representation of the image which is the
reason for its succession.
Local feature description is to characterize the local image
patch. Its basic idea is to extract the essential attribute of the
image content. The features are unrelated with the specific
forms of the image and are self-adaptive to the variations of the
image, such as illumination changes, scale changes, rotation
transformation, perspective changes. An ideal feature
descriptor should be distinctive and robust. Robustness mainly
refers to the ability to work stably under various image
transformations and distortions. Distinctiveness mainly refer to
the ability to capture and reflect the distinction when feature
information in the image block is changed locally. In addition,
the local feature descriptor should have low computational
complexity in the extraction process and require low storage
space.
With the development of the theory and various
applications, the local feature description methods emerged in
an endless stream. In this review, we give a comprehensive
review of local invariant feature descriptors proposed after the
emergence of SIFT. We summarize the status and trends in
recent local feature description methods.
II. C
LASSIFICATION OF LOCAL FEATURE DESCRIPTORS
The whole process of local feature description can be
broken down to the following four steps: local patch accessing,
feature extraction, feature pooling and vector normalization.
Firstly, we get a normalized local patch from the image.
Secondly, different low-level features such as intensity, color,
gradient or LBP is extracted from the local patch. Then, the
low-level features in the local patch are pooled to get a vector.
The final descriptor is constructed by normalization the pooled
vector.
In Mikolajczyk2005 [15], it categorized the feature
description methods into distribution-based descriptors, spatial-
frequency methods, differential descriptors and others. In
Li2008 [16], the existing descriptors were categorized into the
following five types: filter-based descriptors, distribution-based
descriptors, textons, derivative-based descriptors and others. In
this paper, we review the various local feature descriptors
proposed after the emergence of SIFT and thus don’t consider
the methods based on filter and moment. In this paper, we
classify the local feature description methods according to their
features pooling strategy and divides them into three
categories: histogram-based methods, feature comparison
based methods and machine learning based methods.
A. Histogram-based methods
This is the most commonly used method for feature
description. In the feature pooling stage, it uses histogram to
aggregate the low-level features in the patch. An intuitive
descriptor is the histogram of the whole features in the local
patch. In order to enhance the discriminability of the descriptor,
This work is supported by the National Natural Science Foundation of
China ( 61103081, 61201339).