增强多类AdaBoost算法：应对标签错误的噪声数据

69 浏览量更新于2024-07-15 收藏 1.29MB PDF 举报

"这篇论文提出了一种针对标签错误的嘈杂数据的健壮多类AdaBoost算法，旨在解决传统AdaBoost在多类分类任务中遇到的挑战，包括不平衡的训练集、不适用的两类损失函数以及过度拟合问题。" 在机器学习领域，AdaBoost是一种著名的集成学习算法，通过迭代生成一系列弱分类器并结合它们的预测结果来构建一个强分类器。然而，当训练数据中存在标签错误时，AdaBoost可能会过拟合，导致泛化能力下降和模型的不稳定性。为了增强 AdaBoost 对这类问题的鲁棒性，有研究提出了基于噪声检测的AdaBoost (ND_AdaBoost)，它在二类分类任务中表现良好。但ND_AdaBoost在处理多类分类问题时遇到了困难，主要由于以下几个原因： 1. 多类分类通常通过一对一或一对多策略转化为多个二类问题，而这些转化后的问题往往存在训练样本的不平衡，这对ND_AdaBoost的性能产生了负面影响。 2. 直接将ND_AdaBoost应用到多类场景下，其原有的二类损失函数不再适用，且对基础分类器的准确性要求过高（大于0.5），这在实际情况中难以满足。 3. ND_AdaBoost依然存在过拟合的风险，因为它会增加正确分类的噪声样本的权重，使得后续迭代可能过于关注学习这些噪声样本。鉴于这些问题，论文提出了一个新的鲁棒多类AdaBoost算法。该算法可能采用了适应多类环境的新损失函数，改进了权重分配策略，以避免对噪声样本的过度依赖，并可能引入了新的噪声检测机制，以更有效地识别和处理标签错误的样本。此外，可能还包含了防止过拟合的策略，例如正则化或早停技术，以确保模型在复杂数据集上的泛化性能。这篇论文的研究旨在提升AdaBoost在面对多类分类和噪声数据时的性能，这对于现实世界中的机器学习应用，特别是在数据质量不佳的情况下，具有重要的理论和实践价值。通过这样的优化，可以构建出更强大、更稳定的分类模型，从而更好地服务于各种领域的智能决策系统。

90 B. Sun et al. / Knowledge-Based Systems 102 (2016) 87–1 02

Input : Training data set: D

= { (x

, y

) , . . . , (x

, y

) } , where

∈ R

(d > 1) , y

∈ Y = {−1 , 1 } , i = 1 , 2 , . . . , N; Base

classiﬁer learning algorithm: Learn ; Ensemble size:

T; Maximum iteration time: I .

Output : Ensemble classiﬁer: f



)





t=1

, where T



the actual number of generated base classiﬁers.

• Init ializat ion

1 Weights of training examples: w

= 1 /N, i = 1 , 2 , . . . , N;

t = 0 ; iter = 0 .

• Iteration pro cess

2 while (t < T ) and (iter < I) do

3 t = t + 1 ; iter = iter + 1 ;

4 Train a base classiﬁer g

on the training set D

with

weight distribution { w

iter

}

i =1

: g

= Learn (D

, { w

iter

}

i =1

) ;

5 Apply classiﬁer g

to classify training set D

;

6 Use the noise-detection function NDF to determine the

noise label φ

iter

) of training example (x

, y

) ,

i = 1 , 2 , . . . , N;

7 Calculate the weighted training error of g

err



) φ

iter

)= −1

iter

;

8 Compute the weight of classiﬁer g

: α

ln (

1 −err

err

) ;

9 if err

≥ 0 . 5 or err

= 0 then

10 Generate uniform weight distribution: w

iter+1

= 1 /N,

i = 1 , 2 , . . . , N;

11 Re-train the tth classiﬁer in the next iteration:

t = t − 1 ;

12 break;

13 end

14 Update the weights of training examples:

iter+1

= w

iter

exp



− α

) φ

iter

)



, i = 1 , 2 , . . . , N;

15 Normalize the weights of training examples:

iter+1



j=1

iter+1

, i = 1 , 2 , . . . , N;

16 end

Algorithm 1: The main steps of the ND_AdaBoost algorithm.

Fig. 1. A simple instance illustrating a training example’s neighbors.

examples is obtained: μ

iter



i =1

iter

, y

) . Finally, we deter-

mine the noise label of each example. For the i th (i = 1 , 2 , . . . , N)

example ( x

, y

), if its probability is greater than the average value:

iter

, y

) > μ

iter

, it is considered as a mislabeled noise and its

noise label is set to −1 : φ

iter

) = −1 ; otherwise, it is considered

as a non-noisy example, i.e., φ

iter

) = 1 . The general procedure of

the NDF is formally demonstrated in Algorithm 2

Input : Training data set: D

= { (x

, y

) , . . . , (x

, y

) };

Classiﬁcation

results of classiﬁer g

on training set D

{ g

) }

i =1

; Size of neighborhood: k .

Output : The detected noise labels of all the training

examples in the iterth iteration: { φ

iter

) }

i =1

• Determination pro cess

1 for each example (x

, y

) in D

2 Find its k nearest neighbors in D

: neighbors (x

, y

) =

{ (x

, y

) }

j=1

, where ij ∈ { 1 , 2 , . . . , N} , ij  = i , and the

Euclidean distance is used as the distance metric;

3 Calculate the probability of example (x

, y

) being a

mislabeled noise, i.e., the classiﬁcation error rate of

classiﬁer g

in its k nearest neighbors:

iter

, y

) =



j=1

I(g

)  = y

) ;

4 end

5 Calculate the average value over the obtained probabilities of

all the training examples: μ

iter



i =1

iter

, y

) ;

6 for each example (x

, y

) in D

7 if μ

iter

, y

) > μ

iter

then

8 Its noise label is set to −1: φ

iter

) = −1 , put it into

the noisy example set NS: NS = NS ∪ { (x

, y

) } ;

9 end

10 else if μ

iter

, y

) ≤ μ

iter

then

11 Its noise label is set to 1: φ

iter

) = 1 , put it into the

non-noisy example set N N S: N N S = N N S ∪ { (x

, y

) } ;

12 end

13 end

Algorithm 2: The procedure of the noise-detection function

NDF .

3.2. Multi-class AdaBoost algorithm SAMME

Although there already exist many multi-class AdaBoost algo-

rithms, such as SAMME [21] , AdaBoost.Cost [27] , AdaBoost.M1 [28] ,

AdaBoost.MO [29] , AdaBoost.MH [30] , AdaBoost.MR [30] , Ad-

aBoost.HM [31] , etc., we choose Zhu et al.’s multi-class AdaBoost

algorithm SAMME (Stagewise Additive Modeling using a M ulti-

class Exponential loss function) [21] as the foundation of our

algorithm. This is due to that SAMME is a very effective and

popular multi-class AdaBoost algorithm (has been cited 309 times

since 2009) and can be directly applied to the multi-class classi-

ﬁcation case without reducing it to multiple two-class problems.

In this section, we ﬁrst introduce the multi-class exponential loss

function of SAMME, then give its main steps.

3.2.1. Multi-class exponential loss function

In the multi-class classiﬁcation scenario, the class label y

example ( x

, y

) in training set D

= { (x

, y

) , . . . , (x

, y

) } ( x

∈

, y

∈ Y = { 1 , 2 , . . . , K} , i = 1 , 2 , . . . , N, K ≥ 3 is the number of

distinct class labels) is encoded by a K dimensional vector: Y

i 1

, Y

i 2

, . . . , Y

) , where each element in Y

is deﬁned as follows:



1 if y

= j,

−

K−1

if y

 = j.

(1)

It is easy to see that the K elements in Y

satisfy: Y

i 1

+ Y

i 2

+ ···+

= 0 .

Deﬁnition 1 (The multi-class exponential loss function) . Using the

above encoding strategy, the multi-class exponential loss function

剩余15页未读，继续阅读

weixin_38668672

粉丝: 6
资源: 907

增强多类AdaBoost算法：应对标签错误的噪声数据

adaboost算法

基于adaboost算法对气象数据的研究分析

adaboost算法数据集

分析AdaBoost算法的优缺点

数据挖掘AdaBoost算法课程设计

使用python实现AdaBoost算法并对鸢尾花数据集进行分类试验

python实现adaboost算法

OpenCV中ADABOOST算法

adaboost算法优缺点

adaboost算法下的多因子选股

最新资源