
180 C. Leng et al.
As one of the most popular hashing methods, LSH randomly samples hash functions
from a locality sensitive function family [1]. SimHash [4,13] and MinHash [3,20] are
two widely adopted LSH schemes. MinHash is a technique for quickly calculating the
Jaccard coefficient of two sets by estimating the resemblance similarity defined over
binary vercotrs [3]. In contrast, SimHash is an LSH for the similarities (e.g., cosine
similarity) which work on general real-valued data. As indicated in [21], when the data
are high-dimensional and binary, MinHash tends to work better than SimHash. On the
other hand, SimHash achieves better performance than MinHash on real-valued data.
Specifically, to approximate the cosine similarity, Charikar [4] defined a hash function
h as:
h(q)=
1, if w · q>0
0, if w · q<0
(1)
where w is a random vector from the d-dimensional Gaussian distribution N(0,I
d
).
Although with abundant nice theoretical properties, these random projection based data-
independent hashing methods are less discriminative over data and typically need very
long codes to achieve satisfactory search performance.
Recently, many data-dependent hashing methods [26,18,15,11,24,29,16,7,9] have
been proposed to learn data aware hash functions. As we have mentioned, many of
them [26,18,24,29,8,7] are based on eigendecomposition of a matrix (e.g. Laplacian
matrix). This brings the unbalance problem because the information caught in differ-
ent eigenvectors is unbalanced. A few recent works have been proposed to address this
problem.
In [25], instead of learning all the eigenvectors at once, Wang et al. proposed a se-
quential learning framework (USPLH) to learn hash function which tends to minimize
the errors made by the previous one. Inspired by multiclass spectral clustering [28],
in Iterative Quantization (ITQ) [7], Gong et al. proposed an alternating minimization
scheme to learn an orthogonal transformation to the PCA projected data so as to min-
imize the quantization error of mapping the data to their corresponding binary codes
(the vertices of binary hypercube). In Isotropic Hashing (IsoH) [15], Kong et al. pro-
posed to learn projection functions which can produce projected dimensions with equal
variances. Same as in ITQ, they tried to learn an orthogonal transformation to the PCA
projected data by iteratively minimizing the reconstruction error of the covariance ma-
trix and a diagonal matrix. Similar idea was adopted in [27], but in which the PCA
projection was replaced with locality preserving projection (LPP) [10]. In these meth-
ods, longer codes often catch much more information and thus give better experimental
results than the shorter ones. However, to the best of our knowledge, there still lack
enough theoretical guarantee that the performance will be better as the size of codes
increases, like in LSH.
The differences between our method and the previous works are obvious. Instead of
minimizing the quantization error or toughly requiring each dimension to have equal
variance, we leverage the bootstrap sampling scheme and integrate it with PCA. Every
time only the informative top eigenvectors are used to learn short binary code. Owing to
the sophisticated theories established in ensemble learning, our method enjoys several
advantages which are lacking from previous works.