Random Subspace for Binary Codes Learning in Large
Scale Image Retrieval
Cong Leng, Jian Cheng, Hanqing Lu
National Laboratory of Pattern Recognition
Institute of Automation, Chinese Academy of Sciences
Beijing, China
{cong.leng, jcheng, luhq}@nlpr.ia.ac.cn
ABSTRACT
Due to the fast query speed and low storage cost, hashing
based approximate nearest neighbor search methods have
attracted much attention recently. Many state of the art
methods are based on eigenvalue decomposition. In these
approaches, the information caught in different dimensions
is unbalanced and generally most of the information is con-
tained in the top eigenvectors. We demonstrate that this
leads to an unexpected phenomenon that longer hashing
code does not necessarily yield better performance. In this
work, we introduce a random subspace strategy to address
this limitation. At first, a small fraction of the whole feature
space is randomly sampled to train the hashing algorithms
each time and only the top eigenvectors are kept to generate
one piece of short code. This process will be repeated sev-
eral times and then the obtained many pieces of short codes
are concatenated into one piece of long code. Theoretical
analysis and experiments on two benchmarks confirm the
effectiveness of the proposed strategy for hashing.
Categories and Subject Descriptors
H.3.3 [Information Systems]: Information Search and Re-
trieval
General Terms
Algorithms, Experimentation, Measurement
Keywords
Image Retrieval, Random Subspace, Binary Codes, Ham-
ming ranking
1. INTRODUCTION
With the rapid development of Internet, large scale visu-
al databases with high dimensionality are everywhere on the
Web. These huge databases pose significant challenges to vi-
sual search since the linear exhaustive search is really time
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
SIGIR’14, July 6–11, 2014, Gold Coast, Queensland, Australia.
Copyright 2014 ACM 978-1-4503-2257-7/14/07 ...$15.00.
consuming. To address this issue, many hashing based meth-
ods for approximation nearest neighbor (ANN) search have
been proposed recently [6, 1, 7, 2, 3]. In these approaches,
hash functions are learned to map the nearby points in the
original space into similar binary codes. Searching with bi-
nary codes will be very fast because the Hamming distance
between them can be efficiently calculated in the modern
CPU.
As one main branch of the existing hashing methods, the
eigenvalue decomposition based approaches [8, 7, 2, 9] have
attracted much attention. Spectral Hashing (SH) [8] treat-
s the hashing problem as spectral embedding problem and
calculates the bits by thresholding a subset of eigenvectors
of the graph Laplacian. Anchor Graph Hashing (AGH) [7]
follows the same idea of SH but utilizes anchor graph to ob-
tain tractable low-rank adjacency matrices. PCAH [2] sim-
ply generates linear hash functions with PCA projections,
which are the eigenvectors of the data covariance matrix.
For these eigenvalue decomposition based methods, typi-
cally the variances of different projected dimensions are dif-
ferent and thus the information caught by different dimen-
sions is unbalanced. In general, the top eigenvectors carry
most of the information (variance) while the remainders are
usually less informative or even noisy. This will result in
an unexpected phenomenon that longer hashing codes do
not necessarily yield better performance. As highlighted in
Figure 1, when the code length exceeds 8, increasing num-
ber of bits leads to poorer mean average precision (MAP)
performance on MNIST with both PCAH and AGH. Some
recent work such as Iterative Quantization (ITQ) [2] have
been proposed to overcome this problem, but there still lack
of theoretical guarantee that longer codes will give better
result than that of the shorter ones. Furthermore, the di-
mensionality of visual data is generally very high and this is
the main difficulty widely encountered in many eigenvalue
decomposition based methods, e.g., the time complexity of
PCA is O(nd
2
+ d
3
)wheren is the size of training set and
d is the dimensionality of the data.
In this work, we attempt to leverage the random subspace
strategy to deal with these problems mentioned above for
binary codes learning. We aim to concatenate many pieces
of short codes generated with the eigenvalue decomposition
based method into one piece of long code with the expec-
tation that the longer code will be “stronger”. However, it
is clear that if the many pieces of short codes are identi-
cal, the obtained long code won’t catch more information
and will yield the same retrieval result as the short code.
Inspired from random decision forests [4], we adopt the ran-