深度学习驱动的图像检索：2012-2020年进展综述

下载需积分: 50 | PDF格式 | 3.74MB | 更新于2024-07-14 | 163 浏览量 | 举报

2 收藏

深度图像检索：2012年至2020年间的综述论文深入探讨了随着信息技术的发展，尤其是在社交媒体、医学图像分析和机器人技术等领域急剧增长的视觉内容所带来的挑战。内容基于内容的图像检索（CBIR），即寻找数据库中相似图像的能力，长期以来一直是计算机视觉研究的核心课题。传统的CBIR方法已经无法满足日益增长的实时性和准确性需求，而深度学习的引入为这一领域带来了革命性的变化。在过去几年中，深度学习技术在CBIR中的应用显著提升了图像检索的性能。它通过模仿人脑神经网络的复杂结构，如卷积神经网络（CNN）、深度信念网络（DBN）和生成对抗网络（GAN），实现了特征学习的自动化，从而能更好地理解和表示图像内容。这些算法能够捕捉图像的深层次特征，提高了相似性匹配的精度。本文综述了基于深度学习的CBIR研究进展，涵盖了众多新颖的方法和技术，包括但不限于：深度特征提取，如ResNet和Inception系列模型；图像编码和索引方法，如深度图象嵌入（Deep Image Embedding）和深度 hashing；以及端到端的深度学习架构，如深度检索网络（Deep Retrieval Networks），它们可以直接从原始像素映射到潜在的检索空间。此外，文章还讨论了常用的数据集，如ImageNet、COCO、Caltech-256和MIRFLICKR-1M，这些数据集作为基准被广泛用于评估算法的性能。评估指标也得到了改进，如平均精度（mAP）、精确度-召回曲线（Precision-Recall Curve）和区域平均精度（Mean Average Precision per Region, mAPr）等，以全面衡量系统的性能。尽管深度学习在CBIR上取得了显著成果，但也面临着一些挑战，如模型的解释性、大样本训练需求、计算效率问题以及跨模态检索（如文本和图像的联合检索）等。未来的研究方向可能集中在这些问题的解决上，例如开发更加轻量级的模型、提高检索速度、增强跨模态整合能力，以及在隐私保护和公平性方面寻求平衡。这篇综述论文为深度学习在图像检索领域的最新研究成果提供了详尽的概述，为研究人员提供了宝贵的参考，并对未来的研究方向提出了前瞻性的建议。深度学习不仅提升了图像检索的准确性和效率，也为其他相关领域，如计算机视觉、人工智能和大数据分析，开辟了新的研究路径。

展开

DEEP IMAGE RETRIEVAL: A SURVEY 5

GeM

MAC

R-MAC

H×W,MaxPooling

Feature Maps

(Channelwise)

C×1

MaxPooling

for each region

(Channelwise)

C×1

......

C×K

C×1

H×W,

Average

Pooling

(Channelwise)

C× 1

()

SPoC

H/2

W/ 2

H×W, SumPooling

(Channelwise)

C×1

CroW

H×W, SumPooling

(Channelwise)

C×1

Glob al

Average

Pooling

CAM+CroW

H×W, SumPooling

(Channelwise)

C×1

Need to compute

for top K (K<L) classes

...

Channel Weights

Computing

Classifier

Class k

Class Activation Mapping (CAM)

Selected

Weights

Class L

Class 1



1 1 1

{{{ } } }

i j c

H W C

i j c

= = =

()

i j c

, , , ,i j c i j c



( ) ( )

exp

i j c







− + −



=−







,,ij c i j c



( )

{ } ,

i j c

ij c





=

( )

{ } ,

c i j c ij





=

( )

{ } ,

c i j c ij





=

()

i j c

ij c



()

ij ij





()

{}

ij ij

Fig. 5: Representative methods in single feedforward

frameworks, focusing on convolutional feature maps: MAC

[48], R-MAC [28], GeM pooling [42], SPoC with the Gaussian

weighting scheme [7], CroW [10], and CAM+CroW [29]. Note

that g

() and g

() represent spatial-wise and channel-wise

weighting functions, respectively.

is more important than ﬁnal classiﬁcation probabilities. This

section will survey the strategies which have been developed

to improve the quality of feature representations, particularly

based on feature extraction / fusion (Section 3.1) and feature

enhancement (Section 3.2).

3.1 Deep Feature Extraction

3.1.1 Network Feedforward Scheme

a. Single Feedforward Pass Methods.

Single feedforward pass methods take the whole image and

feed it into an off-the-shelf model to extract features. The ap-

proach is relatively efﬁcient since the input image is fed only

once. For these methods, both the fully-connected layer and

last convolutional layer can be used as feature extractors [70].

The fully-connected layer has a global receptive ﬁeld so that

it is able to produce more semantic-aware features [13]. After

normalization and dimensionality reduction, these features are

used for direct similarity measurement without further feature

processing and admitting efﬁcient search strategies [25, 26, 34].

(a) (b)

(c)

(d)

Fig. 6: Image patch generation schemes: (a) Rigid grid; (b)

Spatial pyramid modeling (SPM) splits an image into different

scales and positions (blue, green and red boxes); (c) Dense

patch sampling, where a ﬁxed-size sliding window samples the

image; (d) Region proposals (RP), in which the speciﬁc object

or instance is extracted as region proposals.

Using the fully-connected layer may result in insufﬁcient

performance since it lacks geometric invariance and spatial in-

formation, so the last convolutional layer can be examined in-

stead. The research foci associated with the use of convolu-

tional features is to improve their discrimination, where repre-

sentative strategies are shown in Figure 5. One direction is to

treat regions in feature maps as different sub-vectors, thus com-

binations of different sub-vectors of all feature maps are used to

represent the input image. For instance, Gordo et al. [38] apply

regional maximum activation of convolutions (R-MAC) [28] to

obtain relevant regions on each feature map, which ﬁlters out

some irrelevant (background) information and is beneﬁcial for

extracting instance-relevant features. Inspired by R-MAC, Li

et al. [59] propose a non-linear feature embedding method for

visual object retrieval and achieve remarkable performance im-

provements compared to the state of the art.

b. Multiple Feedforward Pass Methods.

Compared to single-pass schemes, multiple pass methods

are more time-consuming [8] because several patches are gen-

erated from an input image and are both fed into the network

before being encoded as a ﬁnal global feature.

Multiple-pass strategies can lead to higher retrieval accu-

racy since representations are produced from two stages: patch

detection and patch description. Multi-scale image patches are

obtained using sliding windows [26, 71], random cropping [25,

57], and spatial pyramid model (SPM) [32], as illustrated in

Figure 6. For example, Xu et al. [72] randomly sample win-

dows within an image at different scales and positions, then

“edgeness” scores are calculated to represent the edge density

within the windows.

These patch detection methods lack retrieval efﬁciency

for large-scale datasets since irrelevant patches are also fed

into deep networks, therefore it is necessary to analyze

image patches [28]. As an example, Cao et al. [73] propose to

merge image patches into larger regions with different hyper-

parameters, then the hyper-parameter selection is viewed as

an optimization problem under the target of maximizing the

similarity between features of the query and the candidates.

下载后可阅读完整内容，剩余20页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

syp_net

粉丝: 158

深度学习驱动的图像检索：2012-2020年进展综述

深度学习图像检索(CBIR): 十年之大综述

图像检索综述性的文档

浙大最新「多模态深度学习」综述论文

毕业答辩-ASP.NET图像的检索技术毕业设计(源代码论文开题报告外文翻译文献综述答辩PPT).rar

新加坡国立大学最新「大规模深度学习优化」综述论文

ASP.NET图像的检索技术毕业设计(源代码+论文+开题报告+外文翻译+文献综述+答辩PPT)

图像检索毕业设计：从开题到文献综述全攻略

基于内容的图像检索系统综述：技术与进展

深度图像驱动的三维模型精确检索方法综述

三维模型检索深度综述：现状、方法与挑战

最新资源