synthesized images are generated automatically based on
transformation simulation, and then expanded features can
be extracted from these synthesized samples. To the best of
our knowledge, ASIFT method proposed in [22] is the first
systematic framework to establish the whole affine space of
an image, by simulating 41 viewpoints with different affine
simulation parameters. The creative work has received
impressive performance in two image matching application.
But as to object detection in large corpus, this algorithm does
not work well because it simply combines all features
without selection. It will involve lots of redundant features
which not only result in false matching, but also consume
memory and detection time. There are usually about 3000
SIFT features in an image (1024*768), using original ASIFT,
the feature amount will exploded to 41*3000, which
becomes a significant burden to indexing and matching .
Even use remote servers, the memory and time cost is
unbearable in large-scale mobile visual search applications.
In fact, it has been demonstrated that two images can be
correctly matched only using 3–10 pairs of matching features
[13,14,16]. Therefore, mining a set of representative features
which are both robust and informative is very necessary for
large-scale applications. The ASC method proposed in [23]
partially alleviates the problem by stable feature selection
under various neighboring viewpoints. However, the evalu-
ating criterion used for robust feature mining is too strict to
obtain enough useful object characteristics. What is more, to
be invariant under several viewpoints, features selected in
this way often sacrifice distinctiveness.
To large-scale mobile visual search, the distinctiveness of
local features is very important for fast indexing and accurate
matching. Current systems usually use high-dimensional
indexing methods to support large-scale feature matching,
such as tree-based indexing techniques [24],visualwords
method [14], and LSH method [25]. Due to the lack of feature
distinctiveness, the initial search results often contain many
false matching, which must be filter out using complex post-
verification step [14,16,18]. Embedding geometric informa-
tion has been proved very useful to increase the distinctive-
ness of local features. Perdoch discretizes the shape of local
elliptical patches in [26]. Global geometry transformation has
been incorporated into index by feature map hashing [27].
Poullot et al. [28] group spatially neighboring local features
and index triangles. But they are not designed to prevent
query-drift in query expansion, thus time-consuming geo-
metrical consistency validation is still unavoidable.
In summary, to generate an abundant and accurate
expanded feature set for each image in dataset, a proper
mining criterion is of utmost importance. With the
extended information, we can support efficient on-line
dection with both high recall and precision in large-scale
mobile visual search.
2.3. Main contributions
To support large-scale mobile visual search with both
high recall and precision, this paper proposes an accurate
and efficient offline query expansion method, including
robust local patch mining and geometric parameter cod-
ing strategies. Our main contributions are as follows:
To improve recall in mobile visual search under var-
ious viewpoint transformations, a new criterion called
Entropy Loss Ratio (ELR) is presented for robust local
patch mining. Then multiple representative features
are extracted from these selected local patches to deal
with viewpoint changes. Compared with previous
work in [23], this criterion fully considers the robust-
ness and distinctiveness of local patch at the same
time, thus can generate an abundant and accurate
expanded features set for each image in dataset.
To deal with motion blur caused by unavoidable hand
trembling in mobile visual search, we extract features
from Local Average Patch(LAP) around corresponding
viewpoints, instead of from the original local patches.
In order to prevent query-drift in query expansion, the
viewpoint parameters of selected local patches are
recorded with each expanded feature. Based on efficient
viewpoint consistency validation, false matching caused
by expanded features can be removed in advance, to
support fast and accurate feature matching.
Experimental results on several well-known datasets
and a large image set (1 M) have demonstrated the
effectiveness and efficiency of our method. Compared
with the-state-of-art query expansion methods, we
can achieve comparable detection accuracy with only
12% memory cost of ASIFT [22], and our recall is much
higher than SIFT [12] and ASC [23], with a much faster
on-line search process.
3. Automatic feature set expansion
As introduced above, integrating offline affine simula-
tion and representative feature mining is an effective way
to obtain expanded features for each image in dataset. Thus
effective and efficient large-scale mobile visual search
can be supported with both high recall and precision.
To generate the abundant and accurate extended informa-
tion, a proper criterion for local feature mining is very
important. But existing criterion in [23] is too strict to
reserve representative and distinctive features. Therefore,
basedonanimportantobservation,wepresentanovel
method to obtain better expanded features, including robust
local patch mining and representative feature extraction.
3.1. Observation
The closest works to ours are ASIFT method in [22] and
ASC proposed in [23] . Both approaches adopt off-line
automatic sample expansion to deal with image affine
transformation caused by viewpoint changes. The greatest
differences between the three methods is that ASIFT [22]
does not use feature selection mechanism thus results in
serious memory and time cost. ASC [23] selects stable
local features under various viewpoints to alleviate this
problem. But this selection criterion is too strict to obtain
enough useful object characteristics. What is more, to be
invariant under several viewpoints, features selected in
this way often sacrifices distinctiveness.
As shown in Fig. 2, we observed an interesting phe-
nomenon: under various viewpoint changes, using exist-
ing local patch detectors and descriptors, local patches
K. Gao et al. / Signal Processing 93 (2013) 2305–2315 2307