OBSIR: OBJECT-BASED STEREO IMAGE RETRIEVAL
Xiangyang Xu, Wenjing Geng, Ran Ju, Yang Yang, Tongwei Ren and Gangshan Wu
State Key Laboratory for Novel Software Technology
Nanjing University, Nanjing, China
xiangyang.xu@smail.nju.edu.cn, jenngeng@gmail.com, juran@smail.nju.edu.cn,
charlie.yang.nju@gmail.com, {rentw, gswu}@nju.edu.cn
ABSTRACT
Recent years, the stereo image has become an emerging
media in the field of 3D technology, which leads to an
urgent demand of stereo image retrieval. In this paper, we
attempt to introduce a framework for object-based stereo
image retrieval (OBSIR), which retrieves images containing
the similar objects to the one captured in the query image
by the user. The proposed approach consists of both online
and offline procedures. In the offline procedure, we propose
a salient object segmentation method making use of both
color and depth to extract objects from each image. The
extracted objects are then represented by multiple visual
feature descriptors. In order to improve the image search
efficiently, we construct an approximate nearest neighbor
(ANN) index using cluster-based locality sensitive hashing
(LSH). In the online stage, the user may supply the query
object by selecting a region of interest (ROI) in the query
image, or clicking one of the objects recommended by the
salient object detector. For the image retrieval evaluation we
build a new dataset containing over 10K stereo images. The
experiments on this dataset show that the proposed method
can effectively recommend the correct object and the final
retrieval result is also better than other baseline methods.
Index Terms— Stereo image retrieval, object retrieval,
salient object detection, query object recommendation
1. INTRODUCTION
Nowadays, numerous 3D devices such as stereo cameras and
3DTV have experienced an explosive growth in the industrial
community and the stereo images have become an emerging
media widely spread in people’s daily life. With the sharp
increasing of stereo image data, how to manage and access
them efficiently turns out to be an urgent problem, which
is just the same as digital images about 2 decades ago [1].
In the world of monoscopic images, content-based image
retrieval (CBIR) enables us to access relevant images by
image examples while object-based image retrieval (OBIR)
methods [2, 3, 4] accomplish a search with a region of interest
regarded as the desired object by user interaction.
Unfortunately, there are few research works on stereo
image retrieval. In order to solve this urgent problem,
we introduce a complete framework for object-based stereo
image retrieval. First, a preprocessing including stereo
rectification and stereo matching is adopted to produce the
disparity map for each image which encodes the depth
information. Second, the object segmentation procedure is
performed by a salient object detector making use of depth
information. Then, multiple visual features are extracted
including the bag-of-visual-words (BoVW) and they are used
to represent the objects. Finally, the feature vectors are
indexed by a clustering-based LSH. In the online search
phase, the user is first asked to upload an example image. To
select the query object, the user may either drag a region of
interest or pick up an object from the query recommendations.
Based on LSH indexing, a list of stereo images is returned
to the user efficiently. To evaluate the effectiveness of the
proposed framework, we build a new stereo image dataset
called “OBSIR dataset”. In summary, our major contributions
include:
• A novel framework for object-based stereo image
retrieval;
• A salient object detection method that contributes to
a better object segmentation and serves as a query
recommendation in user interaction phase;
• A novel stereo image dataset designed for image
retrieval evaluation that comes from three common
sources, the websites, daily life photography and stereo
movies.
To the best of our knowledge, we believe our work is the
first attempt to explicitly establish a systematically framework
for object-based stereo image retrieval, and also the first one
to build up a stereo image dataset for the evaluation of stereo
retrieval task.
The remaining of this paper is organized as follows. In
Section 2, a brief review of the related work is introduced.
The systemic overview and detailed approach is described in
Section 3 and evaluated by a few experiments on the OBSIR
dataset, as shown in Section 4. Finally, we conclude this paper
with some remarks on the feature work in Section 5.