深度图像检索：面向搜索的全局表示学习方法

需积分: 0 135 浏览量更新于2024-07-15 收藏 6.56MB PDF 举报

深度图像检索：学习全局表示以支持图像搜索在现代计算机视觉领域，深度图像检索（Deep Image Retrieval）是一种关键技术，其目标是通过构建和分析图像的全局特征来实现高效的图像搜索和匹配。本文由Albert Gordo、Jon Almazan、Jerome Revaud和Diane Larlus等人提出，他们隶属于Xerox Research Center Europe的计算机视觉组。传统的图像检索方法通常依赖预训练的深度神经网络，这些网络作为黑盒提供局部特征，然后通过某种方式汇总成全局表示。然而，这篇文章创新地提出了一个深度架构，该架构不仅用于提取特征，还专门针对图像检索任务进行训练。这种方法的两大贡献在于：首先，它采用了一种排名框架，这种框架允许学习卷积和投影权重，这些权重在生成区域特征时被动态优化。这种方法使得网络能够根据特定任务的需求，更精确地提取与图像检索相关的特征，从而提升检索性能。其次，文章引入了区域提议网络（Region Proposal Network），这一步骤有助于智能选择那些对图像全局描述最有影响力的区域进行特征聚合。这样，系统能够在一次前向传播过程中生成一个全局图像描述符，减少了计算复杂性，并提高了检索的精度和效率。为了确保模型在面对噪声数据时依然能保持高效，研究者使用了一个大规模但包含噪声的地标数据集，并开发了一种自动清洗方法。通过这种方式，他们强调了训练数据质量对于深度图像检索成功的重要性，即高质量的数据能够显著提升模型的泛化能力和鲁棒性。这篇文章展示了如何通过深度学习方法，尤其是定制的架构和数据处理策略，来提升图像检索中的全局特征学习能力。这对于图像搜索应用，如图像搜索引擎、图像检索系统以及视觉内容的自动组织和索引至关重要，为未来的计算机视觉研究和发展提供了新的视角和实践案例。

Learning global representations for image search 5

tion in an end-to-end manner. To that aim we leverage a three-stream Siamese

network with a triplet ranking loss. We also describe how to learn the pooling

mechanism using a region proposal network (RPN) instead of relying on a rigid

grid (Section 3.2). Finally we depict the overall descriptor extraction process for

a given image (Section 3.3).

3.1 Learning to retrieve particular objects

R-MAC revisited. Recently, Tolias et al. [14] presented R-MAC, a global im-

age representation particularly well-suited for image retrieval. The R-MAC ex-

traction process is summarized in any of the three streams of the network in

Fig. 1 (top). In a nutshell, the convolutional layers of a pre-trained network

(e.g. VGG16 [46]) are used to extract activation features from the images, which

can be understood as local features that do not depend on the image size or

its aspect ratio. Local features are max-pooled in diﬀerent regions of the image

using a multi-scale rigid grid with overlapping cells. These pooled region features

are independently `

-normalized, whitened with PCA and `

-normalized again.

Unlike spatial pyramids, instead of concatenating the region descriptors, they

are sum-aggregated and `

-normalized, producing a compact vector whose size

(typically 256-512 dimensions) is independent of the number of regions in the

image. Comparing two image vectors with dot-product can then be interpreted

as an approximate many-to-many region matching.

One key aspect to notice is that all these operations are diﬀerentiable. In

particular, the spatial pooling in diﬀerent regions is equivalent to the Region of

Interest (ROI) pooling [47], which is diﬀerentiable [48]. The PCA projection can

be implemented with a shifting and a fully connected (FC) layer, while the gradi-

ents of the sum-aggregation of the diﬀerent regions and the `

-normalization are

also easy to compute. Therefore, one can implement a network architecture that,

given an image and the precomputed coordinates of its regions (which depend

only on the image size), produces the ﬁnal R-MAC representation in a single

forward pass. More importantly, one can backpropagate through the network ar-

chitecture to learn the optimal weights of the convolutions and the projection.

Learning for particular instances. We depart from previous works on ﬁne-

tuning networks for image retrieval that optimize classiﬁcation using cross-

entropy loss [17]. Instead, we consider a ranking loss based on image triplets.

It explicitly enforces that, given a query, a relevant element to the query and a

non-relevant one, the relevant one is closer to the query than the other one. To

do so, we use a three-stream Siamese network in which the weights of the streams

are shared, see Fig. 1 top. Note that the number and size of the weights in the

network (the convolutional ﬁlters and the shift and projection) is independent of

the size of the images, and so we can feed each stream with images of diﬀerent

sizes and aspect ratios.

Let I

be a query image with R-MAC descriptor q, I

be a relevant image

with descriptor d

, and I

−

be a non-relevant image with descriptor d

−

. We

剩余20页未读，继续阅读

kaichu2

粉丝: 855
资源: 71

深度图像检索：面向搜索的全局表示学习方法

matlab中洋红色代码-deeplearning:深度学习笔记

字体跳动深度召回Learning A Retrievable Structure for Large.pdf

ImageRetrieval:基于Pytorch和Flask，感谢image_retrieval_platform

ImageRetrieval:使用C ++的图像检索系统

deep-image-retrieval:深度视觉表示的端到端学习，用于图像检索

ist的matlab代码-ImageRetrieval:基于内容的图像检索技术（例如，使用MatLabGUI的knn，svm）

二次拟合MATLABm文件代码-image_retrieval:image_retrieval

CNN-Web-Demo-for-Image-Retrieval:search.yongyuan.name的代码。 玩的开心！

【Advanced】Image Retrieval in MATLAB: Using Feature Hashing for Image Retrieval

image_retrieval:基于caffe和lsh的图像检索系统演示

最新资源

CNN-Web-Demo-for-Image-Retrieval:search.yongyuan.name的代码。玩的开心！