深度学习驱动的特征学习与哈希编码一体化

143 浏览量更新于2024-08-25 收藏 354KB PDF 举报

"这篇研究论文探讨了一种深度神经网络方法，该方法能够在同一过程中实现特征学习和哈希编码，以优化大规模图像检索任务中的近邻搜索。传统的哈希方法通常先提取手工设计的视觉特征，然后进行独立的投影或量化步骤生成二进制码。然而，这种视觉特征向量可能并不完全适合编码过程，导致生成的哈希码次优。论文提出了一种监督式深度哈希架构，通过精心设计的深度神经网络将图像直接映射为二进制码，由三个主要组成部分组成：1)包含卷积层的子网络用于生成有效特征；2)一个中间转换层，确保特征与编码过程的兼容性；3)以及一个二值化模块，将连续的特征表示转化为二进制码。这种方法旨在提高哈希码的质量，保持相似度，并加速高维数据的检索速度。" 在深度神经网络（DNN）中，特征学习是通过多层非线性变换自动从原始输入中学习表示的过程。在图像检索任务中，特征学习对于捕获图像的关键信息至关重要。论文提出的深度架构将这一过程与哈希编码相结合，减少了传统方法中两个独立步骤的不匹配问题。卷积神经网络（CNN）因其在图像处理领域的强大能力而被用作特征提取器，其层次结构可以捕捉不同级别的语义信息。哈希编码的目标是将高维数据压缩成短二进制码，同时尽可能保留原始数据的相似度。在本文的深度学习框架中，通过训练神经网络，使得编码过程与特征提取相互优化。中间转换层的作用是调整和融合由卷积层产生的特征，以适应二值化的约束，确保编码过程的有效性和精确性。二值化模块则负责将连续的浮点数值转换为二进制形式，这通常通过阈值操作或近似量化来实现。此外，监督学习在该方法中起到关键作用，因为它允许网络在已知的标签信息指导下进行训练，从而优化哈希码的质量，使其能够更好地保留数据之间的相似性。这种方法在大规模图像检索中具有显著优势，因为高效的哈希编码可以极大地减少存储需求和检索时间。这篇论文提出了一种创新的深度学习方法，将特征学习与哈希编码合二为一，提高了图像检索的性能。通过结合CNN的强大特征提取能力、监督学习的指导以及精心设计的二值化过程，该方法有望在实际应用中实现更高效、更精确的高维数据检索。

展开

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks

Hanjiang Lai

†

, Yan Pan

∗

‡

, Ye Liu

, and Shuicheng Yan

†

Department of Electronic and Computer Engineering, National University of Singapore, Singapore

‡

School of Software, Sun Yan-Sen University, China

School of Information Science and Technology, Sun Yan-Sen University, China

Abstract

Similarity-preserving hashing is a widely-used method

for nearest neighbour search in large-scale image retrieval

tasks. For most existing hashing methods, an image is

ﬁrst encoded as a vector of hand-engineering visual fea-

tures, followed by another separate projection or quantiza-

tion step that generates binary codes. However, such visual

feature vectors may not be optimally compatible with the

coding process, thus producing sub-optimal hashing codes.

In this paper, we propose a deep architecture for supervised

hashing, in which images are mapped into binary codes via

carefully designed deep neural networks. The pipeline of

the proposed deep architecture consists of three building

blocks: 1) a sub-network with a stack of convolution lay-

ers to produce the effective intermediate image features; 2)

a divide-and-encode module to divide the intermediate im-

age features into multiple branches, each encoded into one

hash bit; and 3) a triplet ranking loss designed to character-

ize that one image is more similar to the second image than

to the third one. Extensive evaluations on several bench-

mark image datasets show that the proposed simultaneous

feature learning and hash coding pipeline brings substan-

tial improvements over other state-of-the-art supervised or

unsupervised hashing methods.

1. Introduction

With the ever-growing large-scale image data on the

Web, much attention has been devoted to nearest neigh-

bor search via hashing methods. In this paper, we focus on

learning-based hashing, an emerging stream of hash meth-

ods that learn similarity-preserving hash functions to en-

code input data points (e.g., images) into binary codes.

Many learning-based hashing methods have been pro-

∗

Corresponding

author: Yan Pan, email: panyan5@mail.sysu.edu.cn.

posed, e.g., [8, 9, 4, 12, 16, 27, 14, 25, 3]. The existing

learning-based

hashing methods can be categorized into un-

supervised and supervised methods, based on whether su-

pervised information (e.g., similarities or dissimilarities on

data points) is involved. Compact bitwise representations

are advantageous for improving the efﬁciency in both stor-

age and search speed, particularly in big data applications.

Compared to unsupervised methods, supervised methods

usually embed the input data points into compact hash codes

with fewer bits, with the help of supervised information.

In the pipelines of most existing hashing methods for im-

ages, each input image is ﬁrstly represented by a vector of

traditional hand-crafted visual descriptors (e.g., GIST [

18],

HOG

[1]), followed by separate projection and quantiza-

tion steps to encode this vector into a binary code. How-

ever, such ﬁxed hand-crafted visual features may not be op-

timally compatible with the coding process. In other words,

a pair of semantically similar/dissimilar images may not

have feature vectors with relatively small/large Euclidean

distance. Ideally, it is expected that an image feature rep-

resentation can sufﬁciently preserve the image similarities,

which can be learned during the hash learning process. Very

recently, Xia et al. [27] proposed CNNH, a supervised hash-

ing method in which the learning process is decomposed

into a stage of learning approximate hash codes from the su-

pervised information, followed by a stage of simultaneously

learning hash functions and image representations based

on the learned approximate hash codes. However, in this

two-stage method, the learned approximate hash codes are

used to guide the learning of the image representation, but

the learned image representation cannot give feedback for

learning better approximate hash codes. This one-way in-

teraction thus still has limitations.

In this paper, we propose a “one-stage” supervised hash-

ing method via a deep architecture that maps input images

to binary codes. As shown in Figure 1, the proposed deep

architecture

has three building blocks: 1) shared stacked

下载后可阅读完整内容，剩余8页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38564085

粉丝: 5

深度学习驱动的特征学习与哈希编码一体化

图像感知哈希特征提取

基于深度神经网络和哈希算法的图像检索研究.pdf

深度神经网络驱动的特征学习与哈希编码提升大规模图像检索性能

网络游戏-融合深度神经网络模型和二进制哈希的人体动作识别方法.zip

基于深度卷积神经网络和二进制哈希学习的图像检索方法.pdf

深度学习有关哈希的论文学习

基于卷积神经网络和哈希编码的图像检索方法.pdf

基于DNN深度神经网络的多元时序预测模型：历史特征影响下的多输入单输出回归案例研究,基于DNN深度神经网络的多维时序预测模型-历史特征影响的深度探索与实践指南,多维时序 - 基于DNN深度神经网络多

基于医学征象和卷积神经网络的肺结节CT图像哈希检索.pdf

基于深度卷积神经网络与哈希的图像检索.pdf

最新资源