无监督图像检索：二值生成对抗网络（BGAN）

需积分: 16 147 浏览量更新于2024-08-27 收藏 1.05MB PDF 举报

《GAN对抗神经网络》是一篇深度探讨在无监督环境下利用生成对抗网络（Generative Adversarial Networks, GAN）进行图像检索的重要论文。作者Jingkuan Song提出了Binary Generative Adversarial Networks (BGAN)，这是一种创新的方法，旨在将图像编码为二进制代码，同时保持生成的图片与原始图片具有高度相似性。论文的核心挑战集中在两个方面：首先，如何在不借助标签的情况下，直接生成精确的二进制编码，而不是通过放松策略来实现；其次，如何设计一种有效的二进制表示，使其既能作为图像的忠实代表，又能支持准确的图像检索。为了解决这些问题，作者引入了新的sign-激活策略和损失函数，包括生成对抗损失、内容损失以及邻域结构损失。生成对抗损失部分，新的模型设计旨在确保生成的二进制代码能够在对抗性学习的框架下生成真实感强烈的图像，从而提高其与原始数据的匹配度。内容损失则关注生成图像的内在特征保持不变，确保编码的准确性和一致性。而邻域结构损失则进一步优化了编码的聚类性能，使得相似的图像在编码空间中位置相近，有助于提升图像检索的精度。实验结果在标准数据集如CIFAR-10、NUSWIDE和Flickr上展现了显著的优势，表明BGAN不仅能够生成高质量的二进制图像表示，而且在无监督情况下实现了高效的图像检索，这在很大程度上扩展了深度学习在图像检索领域的应用潜力。这篇论文对于理解生成对抗网络在无标签数据处理中的作用，以及如何改进传统方法以提高图像检索性能具有重要意义。

Binary Generative Adversarial Networks for Imag e Retrieval

Jingkuan Song

ABSTRACT

The most striking successes in image retrieval using deep hashing

have mostly involved discriminative models, which require labels. In

this paper, we use binary generative adversarial networks (BGAN) to

embed images to binary codes in an unsupervised way. By restricting

the input noise variable of generative adversarial networks (GAN)

to be binary and conditioned on the features of each input image,

BGAN can simultaneously learn a binary representation per image,

and generate an image plausibly similar to the original one. In the

proposed framework, we address two main problems: 1) how to

directly generate binary codes without relaxation? 2) how to equip

the binary representation with the ability of accurate image retrieval?

We resolve these problems by proposing new sign-activation strat-

egy and a loss function steering the learning process, which consists

of new models for adversarial loss, a content loss, and a neigh-

borhood structure loss. Experimental results on standard datasets

(CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN

signiﬁcantly outperforms existing hashing methods by up to 107%

in terms of mAP (See Table 3)

KEYWORDS

Generative Adversarial Networks, Hashing, Image Retrieval

ACM Reference format:

Jingkuan Song. 2017. Binary Generative Adversarial Networks for Image

Retrieval. In Proceedings of ACM Conference, Washington, DC, USA, July

2017 (Conference’17), 9 pages.

DOI: 10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION

With the rapidly increasing amount of images, similarity search

in large image collections has been actively pursued in a number

of domains, including computer vision, information retrieval and

pattern recognition [

]. However, exact nearest-neighbor (NN)

search is often intractable because of the size of dataset and the high

dimensionality of images. Instead, approximate nearest-neighbor

(ANN) search is more practical and can achieve orders of magnitude

in speed-up compared to exact NN search [18, 39].

Recently, learning-based hashing methods [

]

have become the mainstream for scalable image retrieval due to

their compact binary representation and efﬁcient Hamming distance

Our anonymous code is available at: https://github.com/htconquer/BGAN

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a

fee. Request permissions from permissions@acm.org.

Conference’17, Washington, DC, USA

DOI: 10.1145/nnnnnnn.nnnnnnn

calculation. Such approaches embed data points to compact binary

codes through hash functions, which can be generally expressed as:

b = h

(

)

∈

{

0, 1

}

(1)

where

x ∈ R

M×1

h(.)

are the hash functions, and

is a binary vector

with code length L.

Hashing methods can be generally categorized as being unsuper-

vised or supervised. The unsupervised learning of a hash function

is usually based on the criterion of preserving important properties

of the training data points in the original space. Typical approaches

target pairwise similarity preservation (i.e., the similarity/distance

of binary codes should be consistent with that of the original data

points) [

], multi-wise similarity preservation (i.e., the simi-

larity orders over more than two items computed from the input space

and the coding space should be preserved) [

], or implicit simi-

larity preservation (i.e., pursuing effective space partitioning without

explicitly evaluating the relation between the distances/similarities

in the input and coding spaces), [

]. A fundamental limitation

of a hashing method geared to preserve a particular image property

is that its performance may degrade when it is is applied to a context

where a different property is relevant.

Supervised hashing is designed to generate the binary codes based

on predeﬁned labels [

]. For example, Strecha et al. [

]

developed a supervised hashing which maximizes the between-class

Hamming distance and minimizes the within-class Hamming dis-

tance. [

] proposed to learn the hash codes such to approximate

the pairwise label similarity. Supervised hashing methods usually

signiﬁcantly outperform unsupervised methods. However, the infor-

mation that can be used for supervision is also typically scarce.

More recently, deep learning has been introduced in the devel-

opment of hashing algorithms [

], leading

to a new generation of deep hashing algorithms. Due to powerful

feature representation, remarkable image retrieval performance has

been reported using the hashes obtained in this way. However, a

number of open issues have still remain open. The most successful

deep hashing methods are usually supervised and require labels. The

labels are, however, scarce and subjective. Unsupervised approaches,

on the other hand, cannot take full advantages of the current deep

learning models, and thus yield unsatisfactory performance [

]. An-

other issue is a non-smooth sign-activation function used to generate

the binary codes, which, despite several ideas proposed to tackle

it [2, 6, 26], still makes the standard back-propagation infeasible.

To address the above issues, we propose an unsupervised hashing

method that deploys a generative adversarial network (GAN) [

GAN has proven effective to generate synthetic data similar to the

training data from a latent space. Therefore, if we restrict the input

noise variable of generative adversarial networks (GAN) to be binary

and conditioned on the features of each input image, we can learn a

binary representation for each image and generate a plausibly similar

image to the original one simultaneously. Feeding the generated

images through a “discriminator” that veriﬁes them with respect

to the training images removes the need for supervision and the

arXiv:1708.04150v1 [cs.CV] 8 Aug 2017

下载后可阅读完整内容，剩余8页未读，立即下载

KARL353

粉丝: 0

无监督图像检索：二值生成对抗网络（BGAN）

对抗神经网络.pptx

李宏毅GAN对抗生成网络2018最新ppt全套

生成对抗网络的详细介绍PPT

GAN生成对抗网络综述.pdf

GAN Inversion A Survey.pdf

深度学习生成式对抗网络综述.pdf

网联车对抗神经网络跟驰模型.pdf

【ch13-生成对抗网络】 GAN实战.pdf

第8章-生成对抗网络.pdf

基于对抗神经网络和神经网络模型的筒子纱抓取方法.pdf

最新资源