移动视觉搜索的精确离线扩展：提升召回率与效率

下载需积分: 5 | PDF格式 | 1.61MB | 更新于2024-08-17 | 20 浏览量 | 举报

精确的离线查询扩展在大规模移动视觉搜索中的应用是当前研究的重要课题。移动视觉搜索利用智能手机摄像头捕获的图像进行搜索，然而，这种场景下的技术挑战主要源于两个方面：一是图像仿射变换，由于拍摄角度的变化，同一物体在不同视角下的图像会有显著差异；二是运动模糊，手部抖动可能导致图像质量下降。这两个因素降低了搜索的召回率，即正确识别目标图像的能力。传统的查询扩展方法旨在提高检索准确性，但它们往往消耗大量内存和计算资源，且存在冗余特征的问题。针对这些问题，本文提出了一种新颖的解决方案。首先，作者引入了鲁棒的局部补丁挖掘技术，通过高效地筛选出图像中的关键区域，这些区域对于识别和区分不同视点下的对象至关重要。这一过程依赖于一种创新的准则，旨在评估并挖掘出最具代表性的局部补丁。其次，从挖掘出的局部补丁中，提取出多维度的特征，这些特征能够有效地处理视点变化带来的影响，提高了匹配的稳定性和准确性。为了进一步加速匹配过程并保持精度，作者还记录了每个代表性视点的几何参数，如尺度、旋转和平移等，这些参数在匹配过程中作为参考，使得搜索更加精确和迅速。在实验中，作者在多个知名的数据集以及一个包含一百万张大图像的大型数据集上测试了所提出的算法，结果显示其在召回率提升方面表现出色，尤其是在对抗视点变化的鲁棒性上。此外，该方法不仅适用于移动视觉搜索，也具有广泛的应用潜力，可以扩展到其他的多媒体内容分析任务中，如对象检测等。总结来说，本文的工作为大规模移动视觉搜索提供了一个高效且精确的离线查询扩展框架，通过优化局部补丁挖掘和几何参数编码，有效解决了图像仿射变换和运动模糊带来的挑战，为提高移动设备上的搜索体验开辟了新途径。

synthesized images are generated automatically based on

transformation simulation, and then expanded features can

be extracted from these synthesized samples. To the best of

our knowledge, ASIFT method proposed in [22] is the ﬁrst

systematic framework to establish the whole afﬁne space of

an image, by simulating 41 viewpoints with different afﬁne

simulation parameters. The creative work has received

impressive performance in two image matching application.

But as to object detection in large corpus, this algorithm does

not work well because it simply combines all features

without selection. It will involve lots of redundant features

which not only result in false matching, but also consume

memory and detection time. There are usually about 3000

SIFT features in an image (1024*768), using original ASIFT,

the feature amount will exploded to 41*3000, which

becomes a signiﬁcant burden to indexing and matching .

Even use remote servers, the memory and time cost is

unbearable in large-scale mobile visual search applications.

In fact, it has been demonstrated that two images can be

correctly matched only using 3–10 pairs of matching features

[13,14,16]. Therefore, mining a set of representative features

which are both robust and informative is very necessary for

large-scale applications. The ASC method proposed in [23]

partially alleviates the problem by stable feature selection

under various neighboring viewpoints. However, the evalu-

ating criterion used for robust feature mining is too strict to

obtain enough useful object characteristics. What is more, to

be invariant under several viewpoints, features selected in

this way often sacriﬁce distinctiveness.

To large-scale mobile visual search, the distinctiveness of

local features is very important for fast indexing and accurate

matching. Current systems usually use high-dimensional

indexing methods to support large-scale feature matching,

such as tree-based indexing techniques [24],visualwords

method [14], and LSH method [25]. Due to the lack of feature

distinctiveness, the initial search results often contain many

false matching, which must be ﬁlter out using complex post-

veriﬁcation step [14,16,18]. Embedding geometric informa-

tion has been proved very useful to increase the distinctive-

ness of local features. Perdoch discretizes the shape of local

elliptical patches in [26]. Global geometry transformation has

been incorporated into index by feature map hashing [27].

Poullot et al. [28] group spatially neighboring local features

and index triangles. But they are not designed to prevent

query-drift in query expansion, thus time-consuming geo-

metrical consistency validation is still unavoidable.

In summary, to generate an abundant and accurate

expanded feature set for each image in dataset, a proper

mining criterion is of utmost importance. With the

extended information, we can support efﬁcient on-line

dection with both high recall and precision in large-scale

mobile visual search.

2.3. Main contributions

To support large-scale mobile visual search with both

high recall and precision, this paper proposes an accurate

and efﬁcient ofﬂine query expansion method, including

robust local patch mining and geometric parameter cod-

ing strategies. Our main contributions are as follows:



To improve recall in mobile visual search under var-

ious viewpoint transformations, a new criterion called

Entropy Loss Ratio (ELR) is presented for robust local

patch mining. Then multiple representative features

are extracted from these selected local patches to deal

with viewpoint changes. Compared with previous

work in [23], this criterion fully considers the robust-

ness and distinctiveness of local patch at the same

time, thus can generate an abundant and accurate

expanded features set for each image in dataset.



To deal with motion blur caused by unavoidable hand

trembling in mobile visual search, we extract features

from Local Average Patch(LAP) around corresponding

viewpoints, instead of from the original local patches.



In order to prevent query-drift in query expansion, the

viewpoint parameters of selected local patches are

recorded with each expanded feature. Based on efﬁcient

viewpoint consistency validation, false matching caused

by expanded features can be removed in advance, to

support fast and accurate feature matching.



Experimental results on several well-known datasets

and a large image set (1 M) have demonstrated the

effectiveness and efﬁciency of our method. Compared

with the-state-of-art query expansion methods, we

can achieve comparable detection accuracy with only

12% memory cost of ASIFT [22], and our recall is much

higher than SIFT [12] and ASC [23], with a much faster

on-line search process.

3. Automatic feature set expansion

As introduced above, integrating ofﬂine afﬁne simula-

tion and representative feature mining is an effective way

to obtain expanded features for each image in dataset. Thus

effective and efﬁcient large-scale mobile visual search

can be supported with both high recall and precision.

To generate the abundant and accurate extended informa-

tion, a proper criterion for local feature mining is very

important. But existing criterion in [23] is too strict to

reserve representative and distinctive features. Therefore,

basedonanimportantobservation,wepresentanovel

method to obtain better expanded features, including robust

local patch mining and representative feature extraction.

3.1. Observation

The closest works to ours are ASIFT method in [22] and

ASC proposed in [23] . Both approaches adopt off-line

automatic sample expansion to deal with image afﬁne

transformation caused by viewpoint changes. The greatest

differences between the three methods is that ASIFT [22]

does not use feature selection mechanism thus results in

serious memory and time cost. ASC [23] selects stable

local features under various viewpoints to alleviate this

problem. But this selection criterion is too strict to obtain

enough useful object characteristics. What is more, to be

invariant under several viewpoints, features selected in

this way often sacriﬁces distinctiveness.

As shown in Fig. 2, we observed an interesting phe-

nomenon: under various viewpoint changes, using exist-

ing local patch detectors and descriptors, local patches

K. Gao et al. / Signal Processing 93 (2013) 2305–2315 2307

剩余10页未读，继续阅读

weixin_38714910

粉丝: 4

移动视觉搜索的精确离线扩展：提升召回率与效率

最新资源