深度神经网络中的Bag相似性网络：多实例学习新进展

157 浏览量更新于2024-08-28 收藏 1.07MB PDF 举报

深度多实例学习的Bag相似性网络是一种新兴的研究领域，它结合了深度神经网络的强大表征能力与多实例学习（Multi-instance Learning, MIL）的潜力，以解决计算机视觉、生物识别和自然语言处理等领域的复杂问题。传统意义上的多实例神经网络在处理每个数据包（bag）时，通常是独立学习其特征表示，忽略了不同数据包之间的关系。这种单一的特征学习方式限制了模型在捕捉全局上下文信息方面的表现。在本文中，作者提出了一个名为Bagsimilaritynetwork的创新框架，旨在通过考虑各个数据包间的相似性来提升深度多实例学习的效果。该网络的设计核心在于引入了一种新的策略，即在学习过程中同时考虑到每个实例（instance）及其所属数据包的整体特性，以及不同数据包之间的关联。这不仅有助于提升模型对复杂模式的识别，还能增强模型对负样本的区分能力，因为相似的实例可能共享相同的隐含特征。 Bagsimilarity网络的主要组成部分包括深度神经网络结构，可能是卷积神经网络（CNN）、循环神经网络（RNN）或者注意力机制，用于提取实例级别的特征表示。然后，这些特征会被映射到一个公共的嵌入空间，以便计算出数据包之间的相似度矩阵。这个过程利用了相似性学习的方法，如余弦相似度或欧氏距离，来量化不同数据包之间的关系。在训练阶段，该网络不仅优化每个实例的单独预测，还优化整个数据包的集体表现，以确保所有实例至少有一个是正例。这样做的目的是强制网络学习到整体上的有效特征组合，而不是仅仅依赖于单个最突出的实例。此外，为了防止过拟合和提高泛化能力，作者可能采用了正则化技术、Dropout或早停策略。文章在2019年4月接收，经过修订后于同年7月接受并在线发布，表明该方法经过了严格的同行评审过程。关键词包括多实例学习、神经网络和相似性学习，反映出这篇研究论文的焦点集中在深度学习框架下改进的多实例学习算法及其在实际任务中的应用。深度多实例学习的Bag相似性网络是一个具有前瞻性的研究，它革新了传统多实例学习的特征学习方法，提升了模型对多模态数据的理解和处理能力，并为相关领域的实践者提供了一个有效的工具，以应对日益增长的数据密集型挑战。

580 X. Wang, Y. Yan and P. Tang et al. / Information Sciences 504 (2019) 578–588

features should be updated, which is time-consuming. Thus, we propose a decoupled training scheme: We ﬁrst train a MI-

Net to obtain all instance features; then, we ﬁx the neural features of reference instances and update the features of target

instances.

In summary, the contributions of this study are as follows:

• We propose a learnable bag similarity representation for MIL. To the best of our knowledge, this is the ﬁrst study that

integrates similarity learning with multi-instance neural networks.

• To solve bag similarity learning problems, we propose a novel bag similarity network that takes (N + 1) × Mstreams as

input. For effective training, we propose a decoupled training scheme.

• The proposed BSN method has achieved state-of-the-art performance on several different MIL tasks .

2. Related work

2.1. Multi-instance learning

MIL has long been an active research topic owing to its ability to handle weakly labeled data. Utilizing weakly la-

beled data is highly important, because labeling for big data is costly. MIL has been applied in various computer vision

[17,18,28,33,41] and medical image analysis problems [16,35] . For example, in object detection, Wang et al. [30] formulated

the problem of weakly supervised object detection as a MIL problem and proposed a relaxed MIL solution that uses deep

learning features as instance representation. Cinbis et al. [5] proposed a multi-fold MIL to avoid poor local optimal solutions.

Tang et al. [26] proposed a bag-in-bag formulation for modeling contextual information around objects. Investigating new

MIL methods is essential for understanding weakly labeled data.

In MIL, we are given a set of bags X = { X

, X

, . . . , X

} . Each bag X

can be represented by distinct instances X

{ x

i 1

, x

i 2

, . . . , x

} , where x

denotes the j th instance in bag X

and m

denotes the number of instances in this bag. We

assume that Y

∈ {0, 1} and y

∈ {0, 1} represent the label of bag X

and the label of instance x

, respectively. During the

training phase, only bag labels are available, whereas instance labels are unknown. There are two standard MIL constraints

regarding bag and instance labels: if Y

= 0 , then all instances in the corresponding bag X

are negative; otherwise, at least

one instance x

∈ X

is positive.

2.2. Multi-instance neural network

In the recent years, neural networks have become the most effective method for addressing MIL problems. Ilse et al.

[13] added an attention module in multi-instance neural networks for instance selection and obtained impressive results

for cancer detection in histopathology images. Even in the multi-label setting, Feng et al. [8] conﬁrmed that deep neural

networks are effective.

MI-Net [29] is a typical multi-instance neural network that focuses on MIL problems. MI-Net contains L fully connected

(FC) layers and one MIL pooling layer (generally, L is equal to 4). The ﬁrst L − 1 FC layers are followed by a non-linear

transformation such as the rectiﬁed linear unit (ReLU) [10] , which learns the representations of all instances in the corre-

sponding bag. Here, x



denotes the  th layer output of j th instance x

in bag X

. The MIL pooling layer is used to map all

instance-level features to obtain bag-level representations. Three widely used pooling schemes M(x

L −1

ij | j =1 ... m

) are mentioned

in [29] : 1) max pooling M(x

L −1

ij | j =1 ... m

) = max

L −1

, 2) mean pooling M(x

L −1

ij | j =1 ... m

) =



j=1

L −1

, and 3) log–sum–exp (LSE)

pooling M(x

L −1

ij | j =1 ... m

) =

log [



j=1

exp (r·x

L −1

)]

, where r is a parameter controlling the smoothness of approximation to the

max function. Thus, regardless of the number of input instances, the MIL pooling layer aggregates them into a bag-level

representation. Finally, the probability of a bag being positive can be calculated by an FC layer with only one neuron and

sigmoid activation, and then the bag label is predicted.

The proposed BSN is also based on neural networks. However, unlike previous multi-instance networks that learn a bag

embedding without considering the bag’s relation to other bags, BSN learns a bag embedding by comparing the bag with

the other bags. Furthermore, BSN is different from traditional bag similarity methods that use ﬁxed bag similarity metrics,

as it learns bag similarity using neural networks.

In addition, BSN can be regarded as a special instantiation of memory-augmented neural networks [23] , which are widely

used in meta-learning. Here, memory refers to external memory and is different from the internal memory in long short-

term memory (LSTM) networks [11] . The reference training bags with their feature extraction networks can be considered

external memory in BSN.

3. Bag similarity network for MIL

Unlike traditional methods, the proposed method addresses MIL problems from the new perspective of bag similarity learning.

In the proposed design, each bag is represented by a vector of its similarities to other bags in the training set, and these

similarities are treated as a bag-level representation–hence the term bag similarity network. Fig. 2 shows the overall ar-

chitecture of BSN, where it can be seen that to avoid the complications of updating parameters and reduce computational

剩余10页未读，继续阅读

weixin_38660051

粉丝: 5
资源: 923

深度神经网络中的Bag相似性网络：多实例学习新进展

基于多实例学习的图像检索

多示例学习目标跟踪算法

Python人工智能课程 AI算法课程 Python机器学习与深度学习 13.RNN 共47页.pptx

图像检索实例 （VC）（图像特征抽取，比较）

ROS学习：从文件系统到keras的Siamese网络实战

词向量与word2vec深度解析

【模型评估指标】：深度学习与传统机器学习的评价方法比较

【LSTM详解】：循环神经网络的变体深度剖析

【深度学习与自然语言处理】：Python框架的最佳八大实践

【深度学习自然语言处理】：NLP从入门到进阶的全路径指南

最新资源

图像检索实例（VC）（图像特征抽取，比较）