受限玻尔兹曼机预训练的稀疏神经网络：模态重构与跨媒体检索

75 浏览量更新于2024-08-26 收藏 745KB PDF 举报

本文主要探讨了一种新颖的跨媒体检索方法，即通过受限玻尔兹曼机预训练的稀疏神经网络（MRCR-RSNN）进行模态重构。传统的跨媒体检索研究通常关注如何通过投影矩阵将异构数据映射到一个共享子空间，以便于进行相似性比较。然而，这篇论文提出了不同的思路。作者Bin Zhang、Huaxiang Zhang等人来自山东师范大学计算机科学系，他们提出了一种创新性的模型，该模型不依赖于独立的投影矩阵，而是利用预先训练的稀疏神经网络将一种模式的数据（例如图像）重构为另一种模式（如文本）。这种方法名为模态重构的跨媒体检索（Modality-Reconstructed Cross-Media Retrieval via Sparse Neural Networks Pre-Trained by Restricted Boltzmann Machines，简称MRCR-RSNN）。在MRCR-RSNN模型中，输入是一侧低级别的特征（如图像的像素或文本的词向量），而输出则是另一侧的表示形式。通过这种设计，模型可以直接将一种模式的数据投影到另一种模式的空间，从而简化了传统方法中的复杂操作。受限玻尔兹曼机（Restricted Boltzmann Machines, RBMs）在这里作为预训练工具，帮助神经网络学习到跨模态数据之间的潜在关系，提升模型的泛化能力和性能。预训练阶段是关键，它使得模型能够在没有大量标注数据的情况下捕捉到不同模态数据之间的内在结构。这不仅节省了标注资源，还提高了模型在实际应用中的准确性。通过这种方式，作者们实现了跨媒体检索任务的有效优化，为处理多媒体数据的检索问题提供了一个新的视角和解决方案。总结来说，这篇研究论文的主要贡献在于提出了一种利用预训练受限玻尔兹曼机和稀疏神经网络进行模态重构的跨媒体检索策略，旨在提高检索效率和性能，同时减少了对独立投影矩阵的依赖。这为跨模态数据处理领域提供了有价值的技术支持，并展示了在实际场景中应用深度学习进行多媒体信息检索的可能性。

Modality-Reconstructed Cross-Media Retrieval

Paper:

Modality-Reconstructed Cross-Media Retrieval via Sparse

Neural Networks Pre-Trained by Restricted Boltzmann Machines

Bin Zhang, Huaxiang Zhang

†

, Jiande Sun, Zhenhua Wang, Hongchen Wu, and Xiao Dong

Department of Computer Science, Shandong Normal University

No. 1, University Road, Changqing District, Jinan 250300, China

E-mail: huaxzhang@163.com

†

Corresponding author

[Recei ved August 24, 2017; accepted May 16, 2018]

Cross-media retrieval has raised a lot of research in-

terests, and a signiﬁcant number of works focus on

mapping the heterogeneous data into a common sub-

space using a couple of projection matrices corre-

sponding to each modal data bef or e implementing

similarity comparison. Differently, we reconstruct

one modal data (e.g., images) to the other one (e.g.,

texts) using a model named sparse neural network pre-

trained by Restricted Boltzmann Machines (MRCR-

RSNN) so that we can project one modal data into the

space of the other one directly. In the model, input is

low-level features of one modal data and output is the

other one. And cross-media retrieval is implemented

based on the similarities of their representatives. Our

model need not any manual annotation and its appli-

cation is more widely. It is simple but effective. We

evaluate the performance of our method on several

benchmark datasets, and experimental results prove

its effectiveness based on the Mean Average Precision

(MAP) and Precision Recall (PR).

Keywords: cross-media retrieval, restricted Boltz-

mann machines, sparse neural networks, modality-

reconstructed

1. Introduction

Cross-media retrieval is becoming the trend of infor-

mation retrieval. With the coming era of big data, multi-

modal data grows rapidly. The retrieval of single-modal

data cannot satisfy the needs of people in many domains.

For example, when we retrieve information on the Inter-

net about the Great Wall, we take a photo and submit it

as a query. What we want is not only the similar images,

but also the relevant textual materials about it. Conse-

quently, cross-media retrieval came into play. In this pa-

per, we mainly concentrate on cross-media retrieval be-

tween images and texts, whose retrieval tasks include two

parts: giving a query image to retrieve the similar match-

ing texts, and giving a query text to retrieve the similar

matching images.

Traditional information retrieval is text-based (e.g.,

Ants communicate

with each other using

pheromones. These

chemical signals are

more developed in ants

than in other

hymenopteran groups.

Chelsea's highest

appearance-maker is ex-

captain Ron Harris, who

played in 795 first-class

games for the club

between 1961 and 1980.

The Canadian Rockies are

composed of sedimentary

rock, including shale,

sandstone, limestone and

quartzite, that originated as

deposits in a shallow inland

sea.

Ants communicate

with each other using

pheromones. These

chemical signals are

more developed in ants

than in other

hymenopteran groups.

Chelsea's highest

appearance-maker is ex-

captain Ron Harris, who

played in 795 first-class

games for the club

between 1961 and 1980.

The Canadian Rockies are

composed of sedimentary rock,

including shale, sandstone,

limestone and quartzite, that

originated as deposits in a shallow

inland sea.

Fig. 1. The method of subspace learning for cross-modal

retrieval.

search engines such as Google, Baidu and Bing) or

content-based [1–3] (e.g., retrieval systems such as

SpeechBot [4], VideoQ [5], and SIMPLicity [6]). The

text-based retrieval relies on the keywords which are an-

notated by human. Then in 1990s, the content-based re-

trieval was proposed. However, both of them are single-

modality-based and don’t satisfy the needs of information

retrieval. consequently, the cross-media retrieval is be-

coming more and more popular.

Essentially, the fundamental challenge of cross-media

retrieval is the heterogeneity gap between different media

data. For example, it is difﬁcult to measure the content

similarity between an image with 1000-dimensional vi-

sual features and a text with 100-dimensional textual fea-

tures. Although they may have the same semantic, it is

not easy to ﬁnd the relationship between them. A straight-

forward method, named subspace learning method, maps

the visual features of images and the textual features

of texts into an isomorphic subspace to learn a com-

mon representation, using a couple of projection matri-

ces, so that they can be directly measured (as shown in

Fig. 1). Canonical Correlation Analysis (CCA) is a clas-

sic method which can learn a subspace with the same di-

mensions by maximizing the correlations between differ-

ent modal data [7]. It is unsupervised. The other unsu-

Vol.22 No.5, 2018 Journal of Advanced Computational Intelligence 611

and Intelligent Informatics

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38613548

粉丝: 4

受限玻尔兹曼机预训练的稀疏神经网络：模态重构与跨媒体检索

MATLAB实现混合分类受限玻尔兹曼机不同训练算法的比较.zip

受限玻尔兹曼机RBM

deep-autoencoder-with-RBM-pretraining:一种深度自动编码器，可通过受限的玻尔兹曼机器预训练权重

受限玻尔兹曼机与深度置信网络

受限玻尔兹曼机DBN图像C分类C.zip_dbn_only61q_受限玻尔兹曼机_图像分类

RBM受限玻尔兹曼机

受限玻尔兹曼机概述

RBM 受限玻尔兹曼机

受限玻尔兹曼机和深度置信网络详解

张量环受限玻尔兹曼机

最新资源