双旋转边距森林：优化集成学习的多样性与边距分布

69 浏览量更新于2024-07-15 收藏 723KB PDF 举报

"利用多样性优化集成学习中的边距分布" 集成学习是一种强大的机器学习方法，通过结合多个分类器或预测模型来提升整体性能。边缘分布是衡量这些模型之间差异和性能的关键因素，它反映了分类器在决策边界附近的能力。一个良好的边距分布通常意味着更好的泛化能力，即模型在未见过的数据上的表现。本文提出了一种名为双旋转边距森林（DRMF）的新颖集成学习算法。DRMF的核心思想是通过随机旋转技术生成具有多样性的基本分类器，从而优化组合系统的边距分布。随机旋转是一种数据预处理方法，它可以改变特征空间的方向，使得原本隐藏的模式变得可见，或者使不同分类器对数据有不同的理解。 DRMF的工作原理包括以下几个关键步骤： 1. **数据旋转**：首先，原始数据集被随机旋转，创造出不同的视角或表示，这有助于引入多样性。 2. **基分类器生成**：在每个旋转后的数据视图上训练一个基本分类器，这样得到的分类器由于处理的是不同版本的数据，所以它们可能会有不同的强项和弱点。 3. **边距分布优化**：通过对这些基本分类器的边距分布进行优化，DRMF确保了它们在决策边界附近的性能差异，从而利用这种多样性来提高整体的预测准确性。 4. **融合策略**：最后，DRMF使用一种融合策略将所有基本分类器的输出结合起来，形成最终的预测。这种融合策略可能是投票、加权平均或其他方法，目的是最大化利用各分类器的强项。实验结果在一系列广泛的基准分类任务中展示了DRMF相对于其他经典集成算法如Bagging、AdaBoostM1和Rotation Forest的优越性。Bagging通过Bootstrap抽样创建多样性的分类器，而AdaBoostM1则是通过迭代加权训练数据来提升弱分类器。Rotation Forest则与DRMF类似，也利用旋转来增强多样性，但DRMF通过双重旋转和边距优化进一步提升了性能。 DRMF的成功可以从两个方面进行解释：一是其能够有效地利用多样性，即通过不同的基本分类器捕捉数据的多面性；二是优化了边距分布，使得模型在面对复杂和多变的数据时有更强的泛化能力。 DRMF为集成学习提供了一个新的视角，强调了边距分布和多样性的关键作用，并通过实际应用验证了这种方法的有效性。它对于提升机器学习模型的性能，特别是在面临高维度和复杂数据集时，具有重要的理论和实践意义。

and robustness is presented in Section 4. Section 5 presents the

experimental results and explores the rationality of DRMF. Finally,

Section 6 offers conclusions and future work.

2. Related work

Assume that x

¼½x

; ...; x



is a sample represented by a set F

of n features and every sample is generated independently at

random according to some ﬁxed but unknown distribution D. Let

X be an N  n matrix containing the training set and

Y ¼½y

; ...; y



be an N-dimensional vector containing the class

labels for the data, where y

is a class label of x

from the set of

the class labels f

; ...;

g. Let fC

; ...; C

g be the set of base clas-

siﬁers in an ensemble. In this paper, our aim is to obtain an ensem-

ble system with small generalization error via optimizing the

margin distribution. Here, the generalization error of a classiﬁer

is the probability of C

ðxÞ – y when an example ðx; yÞ is chosen

at random according to the distribution D and denoted as

½C

ðxÞ – y. The margin distribution is a function of h which gives

the fraction of samples whose margin is smaller than h. A good

margin distribution means that most examples have large margins.

Deﬁnition 1. Given x

2 X; h

ðj ¼ 1; 2 ...; LÞ is the output of x

from

. We deﬁne

1; if y

¼ h

1; if y

– h

;



ð1Þ

where y

is the real class label of x

From this deﬁnition, we know that d

¼ 1ifx

is correctly

classiﬁed by C

; otherwise d

¼1.

Deﬁnition 2 [42]. Given x

2 X, the margin of x

in terms of the

ensemble is deﬁned as

mðx

Þ¼

j¼1

; ð2Þ

where w

is the weight of C

and w

> 0.

In [42,51], it is shown that a small generalization error for a vot-

ing classiﬁer can be obtained by a good margin distribution on the

training set. Obviously, the performances of the base classiﬁers have

a signiﬁcant effect on the margin of x

. At the same time, the diver-

sity among base classiﬁers is another key factor. In [46], the under-

lying relationship between diversity and margin was analyzed.

Theorem 1 [46]. Let

be the average classiﬁcation accuracy of the

base classiﬁers. If

is regarded as a constant and if maximum

diversity is achievable, maximization of the diversity among base

classiﬁers is equivalent to maximization of the minimal margin of the

ensemble on the training samples.

It should be noted that our aim is not to maximize the minimal

margin of the ensemble, but to optimize the margin distribution.

We use a disagreement measure [30] to measure the diversity of

the base classiﬁers in our approach. The diversity between classiﬁ-

ers C

and C

is thus computed as

Dis

þ N

; ð3Þ

where N

denotes the number of samples misclassiﬁed by both

classiﬁers, N

is the number of samples correctly classiﬁed by both,

denotes the number of samples which were correctly classiﬁed

by C

but misclassiﬁed by C

, and N

denotes the number of sam-

ples misclassiﬁed by C

but correctly classiﬁed by C

. For multiple

base classiﬁers, the overall diversity is computed as the average

diversity of classiﬁer pairs.

In [39], Rodríguez and Kuncheva designed a method to generate

ensembles based on feature transformation. The diversity of base

classiﬁers is promoted by random splits of the feature set into dif-

ferent subsets. The original feature space is split into K subspaces

(the subsets may be disjoint or may intersect). Then, PCA is applied

to linearly rotate the subspaces along the ‘‘rotation’’ matrix. Diver-

sity is obtained by random splits of the feature set.

Cai et al. [12] proposed a supervised algorithm for feature trans-

formation, which can ﬁnd a projection that maximizes the margin

between different classes. For x

2 X, denote by  ðx

Þ¼fx

; ...; x

the set of its e nearest neighbors and by y

the class label of x

.We

deﬁne



ðx

Þ¼fx

¼ y

; 1 6 j 6 eg; ð4Þ

and



ðx

Þ¼fx

– y

; 1 6 j 6 eg; ð5Þ

so that 

ðx

Þ contains the neighbors which share the same label

with x

, while 

ðx

Þ is the set of the neighbors which belong to

the other classes.

For any x

and x

, we deﬁne

b;ij

1ifx

2 

ðx

Þ or x

2 

ðx

0 otherwise

;



ð6Þ

Table 3

Classiﬁcation performance with different numbers of candidate base classiﬁers.

Data set L =20 L =40 L =60 L =80 L = 100

Australian 86.96 ± 3.01 88.13 ± 3.98 87.54 ± 2.90 87.97 ± 3.27 88.11 ± 3.48

Crx 85.66 ± 14.10 85.51 ± 14.66 85.64 ± 15.10 86.81 ± 13.20 86.37 ± 13.66

Cmc 52.82 ± 3.41 53.97 ± 3.30 53.70 ± 3.28 54.18 ± 2.95 54.24 ± 3.25

Derm 96.47 ± 3.90 96.98 ± 4.24 95.91 ± 5.26 96.47 ± 4.12 96.75 ± 3.84

German 75.40 ± 3.60 77.00 ± 3.40 77.00 ± 3.02 77.70 ± 2.45 77.80 ± 2.94

Glass 72.44 ± 12.14 74.44 ± 11.43 74.89 ± 13.25 78.14 ± 10.91 76.64 ± 10.61

Heart 83.33 ± 3.60 83.70 ± 4.68 84.81 ± 4.43 84.81 ± 4.77 84.44 ± 4.88

Horse 92.95 ± 4.06 92.94 ± 4.06 93.49 ± 3.38 93.22 ± 3.39 93.49 ± 3.83

ICU 94.04 ± 4.69 94.04 ± 4.69 93.56 ± 4.80 94.09 ± 5.21 93.56 ± 4.80

Iono 93.20 ± 4.00 92.92 ± 4.60 93.49 ± 4.95 93.77 ± 5.09 93.47 ± 4.76

Iris 94.67 ± 5.26 94.67 ± 5.26 94.67 ± 5.26 96.67 ± 4.71 98.67 ± 2.81

Movement 80.56 ± 15.15 82.78 ± 16.50 82.78 ± 16.57 82.44 ± 16.29 82.44 ± 16.29

Pima 77.87 ± 4.98 78.52 ± 3.90 78.39 ± 4.53 78.39 ± 4.32 78.78 ± 3.76

Rice 89.73 ± 10.18 88.82 ± 10.46 90.73 ± 12.86 89.82 ± 13.17 89.82 ± 13.17

Spectf 82.61 ± 5.18 83.34 ± 4.39 82.12 ± 7.21 83.25 ± 7.01 82.48 ± 7.91

Thyroid 94.83 ± 6.10 95.30 ± 6.31 94.83 ± 5.21 95.78 ± 5.19 96.26 ± 4.86

Wiscon 97.34 ± 2.59 97.43 ± 2.59 97.71 ± 2.15 97.34 ± 2.59 97.86 ± 2.36

Wdbc 97.19 ± 2.06 97.72 ± 1.66 98.43 ± 1.53 97.72 ± 1.66 97.72 ± 1.66

Yeast 73.11 ± 3.26 73.45 ± 3.12 73.45 ± 3.64 73.38 ± 3.61 73.25 ± 3.47

Zoo 94.39 ± 8.39 94.39 ± 8.39 94.39 ± 8.39 94.39 ± 8.39 94.39 ± 8.39

Average 85.78 86.30 86.38 86.82 86.83

92 Q. Hu et al. / Knowledge-Based Systems 67 (2014) 90–104

剩余14页未读，继续阅读

weixin_38674675

粉丝: 3
资源: 920

双旋转边距森林：优化集成学习的多样性与边距分布

自适应学习算法和数据克隆

基于加速遗传算法的选择性支持向量机集成.pdf

基于数据降维的机器学习分类应用问题探讨 (1).pdf

基于xgboost lstm 朴素贝叶斯 svm的中文微博情感分析实战完整代码数据

Less变量与混合模式：优化样式表

ARM版Ubuntu的办公自动化：LibreOffice性能优化与实用技巧

XHTML 中的内联元素和样式设计原理

web前端开发最新技术（入门篇）：学习项目指南

CSS3中的变量、计算与函数应用

JavaScript 在跨平台移动应用开发中的应用

最新资源