Deep Self-taught Hashing for Image Retrieval
Ke Zhou
Huazhong University of
Science and Technology
k.zhou@hust.edu.cn
Yu Liu
Huazhong University of
Science and Technology
lightyear416@163.com
Jinkuan Song
University of Trento
jingkuan.song@unitn.it
Linyu Yan
Hubei University of
Technology
yanranyaya@hust.edu.cn
Fuhao Zou
Huazhong University of
Science and Technology
fuhao_zou@hust.edu.cn
Fumin Shen
University of Electronic
Science and Technology of
China
fumin.shen@gmail.com
ABSTRACT
Hashing is a promising technique to tackle the problem of
scalable retrieval, and it generally consists two major com-
ponents, namely hash code generation and hash functions
learning. The majority of existing hashing fall under the
shallow model, which is intrinsically weak on mining ro-
bust visual features and learning complicated hash func-
tions. In view of the superiority of deep structure, espe-
cially the Convolutional Neural Networks (CNNs), on ex-
tracting high level representation, we propose a deep self-
taught hashing (DSTH) framework to combine deep struc-
tures with hashing to improve the retrieval performance by
automatically learning robust visual features and hash func-
tions. By employing CNNs, more robust and discrimina-
tive features of the images can be extracted to benefit the
hash codes generation. Then, we apply CNNs and Multi-
layer Perceptron under deep learning scheme to learn hash
function in supervised process by using the generated hash
codes as labels. The experimental results have shown that
the DSTH is superior to several state-of-the-art algorithms.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval; I.5.2 [Pattern Recognition]: Design
Methodology; classifier design and evaluation
Keywords
Data Hashing; Deep Learning; Self-taught; Convolutional
Neural Networks
1. INTRODUCTION
Hashing is a promising technique in terms of similarity
search over large scale dataset. Actually, it is a special way
of dimensionality reduction, mapping high dimensional fea-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.
MM’15, October 26-30, 2015, Brisbane, Australia.
c
2015 ACM. ISBN 978-1-4503-3459-4/15/10 ...$15.00.
DOI: http://dx.doi.org/10.1145/2733373.2806320.
ture to compact hash code. Since the Hamming distance
between two binary hash codes can be computed efficiently
by using bit XOR operation and counting the number of
non-zero bits, an ordinary PC today would be able to do
millions of Hamming distance computation in just a few
milliseconds. As a result, hashing shows incomparable su-
periority in fast similarity search. Currently, the research
on hashing confronts two challenge: first, how to precisely
extract representative image features; second, how to tackle
the problem of semantic gap, leading to accurate transfor-
mation from image feature to hash code.
Aiming at ensuring the effectiveness of hashing, hash code
should preserve the property of discrimination, i.e., objects
withsimilarsemanticshouldbemappedintosimilarhash
codes and vise versa. Several data-aware hashing methods
have been proposed by introducing the machine learning
tricks into the field of hashing to enhance the effectiveness
of hash codes [2, 5, 10]. Self-taught hashing (STH) [7] is
proposed and considered as one of state-of-the-art works.
However, it suffers from overfitting problem since the oper-
ations of generating hash codes for training data and hash
function for testing data are independently handled, which
leads to poor generalization ability. Minimal loss hashing
(MLH) [8] have shown higher search accuracy than unsu-
pervised hashing approaches, but they all impose difficult
optimization and slow training mechanisms. Spectral hash-
ing (SpH) [6] uses a separable Laplacian eigenfunction (LE)
formulation that ends up assigning more bits to directions
along which the data has a greater variance. However, this
approach is somewhat heuristic and relies on an unrealis-
tic assumption that the data is uniformly distributed in a
high-dimensional rectangle. To summarize, the majority of
existing hashing fall under the shallow model, which perform
poor in discovering semantic information. The drawback de-
rives from two aspects: (1) For feature extraction, existing
hashing methods are mainly based on hand-crafted feature,
such as Color Histogram, GIST, SIFT, BoW, etc. However,
those features are limited in aspect of reflecting image se-
mantic information, because they represent the semantical
content in just one aspect, either global or local view. (2)
For semantic preservation, shallow model intrinsically could
not explore the high level semantic information contained in
feature data.
To avoid the shortcoming of hashing method of shallow
model, a few deep hashing method have been proposed.
Comparing to shallow learning based hashing, deep learning