Maximally Visual-Homogeneous Region Detector
for Large Scale Image Retrieval
Gang Wang, Ke Gao, Jintao Li
Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)
Institute of Computing Technology, CAS, Beijing 100190, China
{wanggang01, kegao, jtli}@ict.ac.cn
ABSTRACT
Conventional local detectors often extract numerous small
repeated regions in textured areas, which easily results in
false matching. In order to find representative and
distinctive local invariant regions, this paper proposes a
Maximally Visual-Homogeneous Region (MVHR) detector.
The main contributions can be summarized as 2 parts: (1)
Being different from original MSER which employs single
pixel intensity as ranking unit, we propose a novel sorting
method based on visual homogeneity analysis on a local
patch. (2) Identifying the observation scale has a close
relationship with visual homogeneity analysis, a heuristic
scale selection algorithm is developed to choose a proper
scale according to the changes of visual homogeneity
evaluation over a range of scales. Experiments demonstrate
our detector can find less but representative regions with
high repeatability, while still perserving competitive
precision compared to the state-of-art detectors for large
scale image retrieval.
Categories and Subject Descriptors
H.5.1 [Multimedia Information Systems]: Methodology
General Terms
Algorithms, Design, Performance, Theory.
Keywords
local feature detector, visual homogeneity, scale selection
1. INTRODUCTION
Using local detector to extract regions of interest is an
important statistic method to represent a picture [10]. For
robust image matching, a “qualified” local feature detector is
desired to be distinctive and repeatable under various image
transformations.
MSER [6] is one of the most popular local feature
detectors with good performance [7]. In principle, the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
ICMR’15, June 23–26, 2015, Shanghai, China.
Copyright 2015 ACM 978-1-4503-3274-3/15/06 ...$15.00.
http://dx.doi.org/10.1145/2671188.2749315.
(a) Harris, 406 (b) SIFT, 998
(c) MSER, 260 (d) Ours, 1
Figure 1: Output regions detected by local feature
detectors and their corresponding numbers. The
red contours denote regions of interest detected by
four methods. Intensity-based methods (Harris,
SIFT, MSER) often split the integral object into
“pieces”, while the proposed detector can find few
but representative visual-homogeneous regions. The
blue contour shows the exact boundary detected by
our method.
method searches for closed regions which achieve local
maximal stability over a range of gray value, however when
coming to the textured areas, it always extracts a lot of
small and redundant patches. Figure 1 (c) shows an
example detection result on a handbag which is covered by
many black and yellow blobs. Similar phenomenon also
exists in Harris and SIFT like Figure 1 (a) and (b). These
regions have similar appearances after normalization.
In order to deal with this problem, our major
contribution extends MSER to perceptual consistency
awareness. Visual homogeneity is defined on a certain size
of window according to the space distribution of color
classes. It serves as the basic visual unit instead of gray
value in MSER. As is shown in Figure 1 (d), the proposed
MVHR can find less but representative regions and well
solves the problem as mentioned above.
Noticing that observation scale has a great influence on
visual homogeneity evaluation, therefore our another