深度学习实例感知哈希：多标签图像检索新方法

172 浏览量更新于2024-08-26 收藏 2.56MB PDF 举报

"多标签图像检索的实例感知哈希" 在图像检索领域，相似性保留的哈希技术是一种高效的方法，用于在大规模数据集中快速找到最接近的邻居。随着深度学习的发展，基于深度网络的哈希方法越来越受到关注，因为它们能够同时学习到图像的有效表示和紧凑的哈希编码。然而，对于具有多个标签（即包含多个对象或类别）的图像，传统的语义哈希方法——将每个图像表示为单一的哈希码，可能并不理想。本文提出的"实例感知哈希"针对多标签图像检索问题进行了创新。作者Hanjiang Lai、Pan Yan、Xiangbo Shu、Yunchao Wei和Shuicheng Yan（均为IEEE资深会员）指出，现有的哈希方法忽视了图像内部不同类别之间的差异。为解决这一问题，他们设计了一个深度学习架构，该架构能学习到多标签图像的实例感知图像表示。这些表示被分组，每个组对应图像中的一个类别，包含了该类别对象的特征。实例感知表示不仅强化了语义哈希的性能，还引入了类别感知哈希的概念。在类别感知哈希中，每个图像不再由单一的哈希码表示，而是由多个哈希码表示，每个码对应图像中的一个类别。这种分离的表示方式使得在检索时可以针对每个类别独立地进行匹配，提高了检索的精度和效率。通过在多个基准数据集上的实验，如COCO、MS COCO、NUS-WIDE等，文章展示了所提方法在语义哈希和类别感知哈希任务上相比于最新监督和无监督哈希方法的显著改进。实验结果证实，这种方法在保持相似性的同时，提高了多标签图像检索的准确性和召回率，证明了其在图像检索领域的优越性。 "实例感知哈希"是一种适应多标签图像检索需求的新颖方法，它利用深度学习捕捉图像的多类别特性，通过实例和类别感知的表示，提升了哈希编码的质量，从而优化了大规模图像检索的效果。这项工作为深度学习在图像检索领域的应用提供了新的视角和研究方向。

LAI et al.: INSTANCE-AWARE HASHING FOR MULTI-LABEL IMAGE RETRIEVAL 2471

Learning-based hashing (or Learning-to-hash) pursues a

compact binary representation from the training data. Based

on whether side information is used or not, learning-to-hash

methods can be divided into two categories: unsupervised

methods and supervised methods.

Unsupervised methods try to learn a set of similarity-

preserving hash functions only from the unlabeled data.

Representative methods in this category include Kernelized

LSH (KLSH) [2], Semantic hashing [13], Spectral

hashing [14], Anchor Graph Hashing [3], and Iterative

Quantization (ITQ) [1]. Kernelized LSH (KLSH) [2]

generalizes LSH to accommodate arbitrary kernel functions,

making it possible to learn hash functions which preserve data

points’ similarity in a kernel space. Semantic hashing [13]

generates hash functions by a deep auto-encoder via stacking

multiple restricted Boltzmann machines (RBMs). Graph-based

hashing methods, such as Spectral hashing [14] and Anchor

Graph Hashing [3], learn non-linear mappings as hash

functions which try to preserve the similarities within the

data neighborhood graph. In order to reduce the quantization

errors, Iterative Quantization (ITQ) [1] seeks to learn an

orthogonal rotation matrix which is applied to the data matrix

after principal component analysis projections.

Supervised methods aim to learn better bitwise repre-

sentations by incorporating supervised information. Notable

methods in this category include Binary Reconstruction

Embedding (BRE) [6], Minimal Loss Hashing (MLH) [15],

Supervised Hashing with Kernels (KSH) [5], Column Gen-

eration Hash (CGHash) [16], and Semi-Supervised Hashing

(SSH) [17]. Binary Reconstruction Embedding (BRE) [6]

learns hash functions by explicitly minimizing the reconstruc-

tion errors between the original distances of data points and

the Hamming distances of the corresponding binary codes.

Minimal Loss Hashing (MLH) [15] learns similarity-

preserving hash codes by minimizing a hinge-like loss func-

tion which is formulated as structured prediction with latent

variables. Supervised Hashing with Kernels (KSH) [5] is a

kernel-based supervised method which learns to hash the data

points to compact binary codes whose Hamming distances are

minimized on similar pairs and maximized on dissimilar pairs.

Column Generation Hash (CGHash) [16] is a column gen-

eration based method to learn hash functions with proximity

comparison information. Semi-Supervised Hashing (SSH) [17]

learns hash functions via minimizing similarity errors on the

labeled data while simultaneously maximizing the entropy of

the learnt hash codes over the unlabeled data. In most image

retrieval applications, the number of labeled positive samples

is small, which results in bias towards the negative samples and

over-ﬁtting. Tao et al. [18] proposed an asymmetric bagging

and random subspace SVM (ABRS-SVM) to handle these

problems.

In supervised hashing methods for image retrieval,

an emerging stream is the deep-networks-based

methods [4], [8], [9], [19] which learn image representations

as well as binary hash codes. Xia et al. [4] proposed

Convolutional-Neural-Networks-based Hashing (CNNH),

which is a two-stage method. In its ﬁrst stage, approximate

hash codes are learned from the supervised information.

Then, in the second stage, hash functions are learned based

on those approximate hash codes via deep convolutional

networks. Lai et al. [8] proposed a one-stage hashing

method that generates bitwise hash codes via a carefully

designed deep architecture. Zhao et al. [9] proposed a

ranking based hashing method for learning hash functions

that preserve multi-level semantic similarity between images,

via deep convolutional networks. Lin et al. [20] proposed

to learn the hash codes and image representations in a

point-wised manner, which is suitable for large-scale datasets.

Wang et al. [21] proposed Deep Multimodal Hashing with

Orthogonal Regularization (DMHOR) method for multimodal

data. All of these methods generate one piece of hash code for

each image, which may be inappropriate for multi-label image

retrieval. Different from the existing methods, the proposed

method can generate multiple pieces of hash codes for an

image, each piece corresponding to a(n) instance/category.

III. T

HE PROPOSED METHOD

Our method consists of four modules. The ﬁrst module is to

generate region proposals for an input image. The second mod-

ule is to capture the features for the generated region proposals.

It contains a deep convolution sub-network followed by a

Spatial Pyramid Pooling layer [11]. The third module is a label

probability calculation module, which outputs a probability

matrix whose i-th row represents the probability scores of the

i-th proposal belonging to each class. The fourth module is a

hash coding module that ﬁrstly generates the instance-aware

representation, and then converts this representation to hash

codes for either category-aware hashing or semantic hashing.

In the following, we will present the details of these modules,

respectively.

A. Region Proposal Generation Module

Many methods for generating category-independent region

proposals have been proposed, e.g., Constrained Para-

metric Min-Cuts (CPMC) [22], Selective Search [23],

Multi-scale Combinatorial Grouping (MCG) [24], BInarized

Normed Gradients (BING) [25] and Geodesic Object Propos-

als (GOP) [26]. In this paper, we use GOP [26] to automat-

ically generate region proposals for an input. Note that other

methods for region proposal generation can also be used in

our framework.

GOP is a method that can generate both segmentation masks

and bounding box proposals. We use the code

provided by the

authors to generate the bounding boxes for region proposals.

B. Deep Convolution Sub-Network Module

GoogLeNet [27] is a recently proposed deep architecture

that has shown its success in object categorization and object

detection. The core of GoogLeNet is the Inception-style con-

volution module which allows increasing the depth and width

of the network while keeping reasonable computational costs.

Here we adopt the architecture of GoogLeNet as our basic

framework to compute the features for the input proposals.

http://www.philkr.net/home/gop

剩余10页未读，继续阅读

weixin_38546817

粉丝: 8
资源: 911

深度学习实例感知哈希：多标签图像检索新方法

深度语义的多标签图像检索.ppt

iOS仿淘宝搜索记录，多标签自动分布,自动布局！

用卷积神经网络的多标签图像检索方法研究_叶青青.caj

实例感知哈希在多标签图像检索中的应用与影响

基于感知哈希的相似性图像检索

基于内容的图像检索系统源码+项目说明（实现颜色特征提取等功能）.zip

基于图像颜色语义检索软件设计

大数据中的图像和视频分析.pptx

国科大-多媒体分析与理解-2018期末试题

17. 图的哈希算法与相关研究

最新资源