实例级多实例判别分析：一种新方法

需积分: 5 151 浏览量更新于2024-07-10 收藏 590KB PDF 举报

"这篇文章主要介绍了多实例判别分析（Multiple-instance discriminant analysis，简称MIDA），这是一种用于处理多实例学习中的特征提取问题的方法。MIDA借鉴了线性判别分析（LDA）的思想，可以看作是LDA在多实例学习场景下的扩展。与MidLABS不同，MIDA是从实例层面进行学习，而MidLABS则是在包（bag）级别进行学习。MIDA有两类版本，二元类别的B-MIDA和多元类别的M-MIDA，分别用于处理二元（标准）和多元的多实例学习任务。" 正文：多实例判别分析（MIDA）是一种机器学习方法，特别是在计算机视觉、自然语言处理等领域，当数据以“袋”（bag）的形式出现，每个“袋”包含多个“实例”（instance），但每个实例的具体贡献不明确时，这种方法显得尤为重要。在这样的情况下，传统的单实例学习方法如支持向量机（SVM）或线性判别分析（LDA）可能无法有效工作，因为它们假设每个样本都有明确的类别标签，而在多实例学习中，只有“袋”有标签，而“袋”内的实例可能混合了不同的类别信息。 MIDA的基本思想是将LDA的概念扩展到多实例学习环境中。LDA是一种经典的监督学习方法，旨在找到一个投影，使得同一类别的样本在投影后的空间中尽可能接近，同时不同类别的样本尽可能远离。在MIDA中，这一原则被应用到实例级别，而不是整个“袋”。MIDA通过优化实例级别的判别性，试图捕捉到那些对分类最有影响力的实例特征，从而实现有效的特征提取和降维。 MIDA分为两个变体：B-MIDA（二元类别的MIDA）和M-MIDA（多元类别的MIDA）。B-MIDA适用于二分类问题，其目标是区分两个类别的实例。M-MIDA则用于处理具有多个类别的多实例问题，它需要在多个类之间建立清晰的决策边界。在算法实现上，MIDA通常采用块坐标上升（block coordinate ascent）策略进行优化，这是一种迭代方法，每次迭代只更新一部分变量，以逐步逼近全局最优解。这种方法在处理大规模数据集时具有较好的计算效率和收敛性能。关键词中提到的“特征提取”和“降维”是MIDA的核心部分。特征提取是寻找能够最好地区分不同类别的关键特征，而降维则是为了减少数据的复杂性，提高模型的训练速度和泛化能力。在多实例学习中，这两个过程尤其重要，因为实例之间的关系和“袋”的结构增加了问题的复杂性。 MIDA提供了一种处理多实例学习问题的新途径，通过实例级别的判别分析，它可以更好地理解和利用“袋”内部的结构信息，从而在分类和特征提取任务中取得良好效果。这一方法对于那些依赖于复杂、模糊或部分信息的数据集的领域，如药物发现、图像识别和文本分类等，具有广泛的实用价值。

the relative change of the objective function in two neighboring

iterations is less than a predeﬁned threshold.

The rest of this paper is organized as follows. In Section 2,wegive

a brief review of sever al e xisting multiple-instance featur e extraction

algorithms and discuss their relationships to our work. In Section 3,

we introduce B-MID A and discuss how to optimize B-MIDA. In Section

4, we extend B-MID A to the multi-class case and get M-MID A, and

then give the optimization of M-MIDA. In Section 5,wecompare

B-MIDA and M-MIDA with some competing algorithms via empirical

experiments conduct ed on the synthetic and real-world datasets.

Finally , we give concluding remarks and discuss the future work in

Section 6.

2. Related algorithms

In this section, we give a brief review of three multiple-instance

dimensionality reduction algorithms: MIDR [26],MidLABS[27],and

CLFD A [28], discuss their relationships to our algorithms, and analyze

their time complexities. Note that B-MIDA and M-MIDA have the

same design principles, the slight difference between them is that

B-MIDA is for binary-class learning whereas M-MIDA is for multi-class

learning. In the following discussions, for simplicity, we utilize MIDA

torepresentbothofthemifthisdoesnotcauseambiguities.

2.1. MIDR

MIDR aims at making the posterior probability of a bag being

positive close to one if the bag is truly positive and zero otherwise. The

objectives of MIDR and MIDA are highly different from each other.

MIDR minimizes the sum of squared losses between the above

posteriors and the binary bag labels, whereas MID A maximizes the

difference between the between-class scatterings and the within-class

ones.OnepointincommonisthatbothMIDRandMIDAcontaina

process of seeking positiv e pro to types, despite that MID A performs

the seeking explicitly, while MIDR performs the seeking implicitly

(involv ed in the calculation of the above posterior probabilities).

Ne xt we analyze the time complex ity of MIDR. Suppose the

transformation matrix A to be calculated in MIDR is of size D  d.

MIDR uses gradient descent to updat e A,andtheupdateconsistsof

two kinds of iterations: the outer iteration and the inner one. In

the outer iteration, the main work is to calculate the gradient of the

objective function w .r .t. A, during which we need to calculate the

gradient of the posterior each instance being positive w .r .t. A,

summarize these gradients to get the total gradient, and project the

total gradient onto the tangent space. The time complexity of each

outer iteration (without considering the inner iteration contained in it)

is Oðn

sum

ÞþOðD

Þ,wheren

sum

denote the number of all

instances in all bags, Oðn

sum

Þ is the time complexity of calculating

the gradients, OðD

Þ is the time complexity of projecting the total

gradient onto the tangent space. In each outer iteration, there is also

an inner iteration which is adopted to tune the step size of the

gradient update, and the time complexity of each inner iteration is

approximat ely the same to that of each outer one (without considering

the inner iteration). Let t

and t

out

respectively denote the average

number of inner iterations and the number of outer iterations, then

the overall time complexity of solving MIDR is Oðt

out

sum

þt

out

Þ.

2.2. MidLABS

Both MIDA and MidLABS simultaneously maximize the between-

class scatterings and minimize the within-class ones, hence both

of them can be treated as multiple-instance e xtensions of LDA.

One obvious difference between them is that MID A utilizes the

trace-difference formulation while MidLABS utilizes the trace-ratio

one. However, since Guo et al. [31] have shown that the trace-

difference formulation is very close to the corresponding trace-ratio

one (one important conclusion of [31] shows that the transformation

matrixofthetrace-differenceproblemisthesametothatofthe

corresponding trace-ratio problem, as long as the trade-off para-

meter (in our case, α) of the trace-difference problem equals the

optimal ratio of the corresponding trace-ratio problem; the detailed

proof of this conclusion can be found in Theorem 2 of [31]), the

difference in formulations is not the ma jo r one between MIDA and

MidLABS. One major difference is that they construct scattering

matrices from different levels. MIDA constructs scattering matrices

from the instance level, i.e., it selects a proto type for each bag and

utilizes the prototype as the representative of this bag to construct

scattering matrices. In contrast, MidLABS constructs scattering

matrices from the bag level by directly eva luating the scatterings

among bags. The other major difference is that MidLABS takes the

structural information of data into account, whereas MIDA does not.

MidLABS treats instances in each bag as non-i.i.d. ones, i.e., it

considers the relationship among instances in each bag by measuring

their distances and building an edge between two instances if their

distance is smaller than a threshold, and then utilizes edges to

describe the structural information among within-bag instances. In

contrast, MID A treats instances in the same class as i.i.d. ones, i.e., it

utilizes the selected positive instances (positive prot otypes) and

mean vect ors of negative instances (negative proto types) to construct

scattering matrices. In short, similar to LDA, MIDA does not consider

the structural information of data as well, because it pays no

attention to the relationship among within-bag instances.

The design of MidLABS consists of two steps: constructing scatter -

ing matrices and operating eigenvalue decomposition. There are two

kindsofscatteringmatricesinMidLABS,i.e.,thenodematricesandthe

edge matrices. Let l denote the number of all bags, n

ave

denote the

av erage number of instances in each bag, then the time complexity of

constructing the node matrices is Oðl

ave

Þ. Before the construction

of the edge matrices, the Euclidean distance between each pair of

within-bag instances should be calculated, and the time complexity of

calculating these Euclidean distances is Oðln

ave

DÞ.Afterthat,wemay

construct the edge matrices, and the time complexity of this process is

Oðl

ave

Þ, which is appro ximat ely n

ave

times of that of constructing

thenodematrices,duetothatthenumberofedgesinabagisusually

the sq uare of the number of nodes in this bag. Therefore, the overall

time complexity of constructing scattering matrices is Oðl

ave

þln

ave

Dþl

ave

Þ,whichcanbeapproximateasOðl

ave

Þ.The

time complexity of operating eigenvalue decomposition is OðD

Þ.

Hence, the overall time complexity of solving MidLABS can be

approximated as Oðl

ave

þD

Þ.

2.3. CLFDA

CLFDA [28] performs multiple-instance dimensionality reduc-

tion by incorporating the citation and reference information [19]

into local Fisher discriminant analysis [32], thus it can be treated

as the multiple-instance extension of LDA as well. The central idea

of CLFDA and that of MIDA are kind of complementary to each

other, because MIDA tries to seek correctly labeled instances in

positive bags (i.e., positive prototypes), whereas CLFDA tries to

detect incorrectly labeled instances in positive bags (i.e., false

positive instances). In order to detect false positive instances,

CLFDA ﬁrst pre-labels all instances with their bag labels, and then

adopts the neighborhood information among them to detect the

false positive ones. However, the rationality of the pre-labeling

process is questionable, because simply treating all instances in

positive bags as positive instances is usually not a reasonable

J. Chai et al. / Pattern Recognition 47 (2014) 2517–2531 2519

剩余14页未读，继续阅读

weixin_38524139

粉丝: 7
资源: 916

实例级多实例判别分析：一种新方法

spss判别分析案例详解.pdf

SAS系统讲义-判别分析实例.xls

判别分析python

鸢尾花数据集判别分析python代码

层次分析法实例Python

多元统计分析 李高荣pdf

matlab中的判别方法有哪些

用python对5个变量的数据创建fisher判别模型

fulllda = lineardiscriminantanalysis()

最新资源

多元统计分析李高荣pdf