自适应属性加权提升朴素贝叶斯分类性能

129 浏览量更新于2024-07-15 2 收藏 2.61MB PDF 举报

"这篇研究论文探讨了朴素贝叶斯分类器在自适应属性加权方面的应用，以提高分类性能。作者包括JiaWu、ShiruiPan、XingquanZhu、ZhihuaCai、PengZhang和ChengqiZhang，分别来自中国地质大学（武汉）、悉尼科技大学量子计算与智能系统中心以及佛罗里达大西洋大学。文章发表于2014年9月30日，主要关键词包括朴素贝叶斯、自适应、属性权重、人工免疫系统和进化计算。" 朴素贝叶斯分类是一种广泛应用于分类任务的机器学习方法，其核心是基于贝叶斯定理和特征之间的独立性假设。由于其算法简单、计算效率高以及在高维数据（如文本）上的良好分类准确率，朴素贝叶斯受到了许多领域的青睐。然而，在实际应用中，这种假设的强条件独立性常常成为其性能提升的障碍，可能导致分类效果下降。为了克服这一挑战，研究者们提出了多种改进策略，其中包括对属性进行自适应加权。自适应属性加权的目标是根据数据集的特点动态调整每个特征的重要性，以减弱或消除不合理的独立性假设对分类结果的影响。这种方法通常结合其他优化技术，如人工免疫系统和进化计算，来寻找最优的权重分配。人工免疫系统（Artificial Immune Systems, AIS）是一种受到生物免疫系统启发的计算模型，它能够模拟自然免疫系统的学习和适应能力，用于解决优化问题。在朴素贝叶斯分类中，AIS可以用来搜索最佳的属性权重，通过模拟抗体-抗原相互作用的过程，找到能够提升分类性能的权重组合。另一方面，进化计算（Evolutionary Computing）是一种全局优化技术，包括遗传算法、粒子群优化等，它们通过模拟生物进化过程中的选择、交叉和变异操作，迭代地改进解决方案，以求得问题的近似最优解。在自适应属性加权的场景下，进化计算可以用来更新和优化属性权重，以适应不同数据集的特性。论文中可能详细讨论了如何将这些优化策略融入到朴素贝叶斯分类器中，以及通过实验验证了这些方法在提高分类精度、降低过拟合风险等方面的效能。通过这样的自适应加权，朴素贝叶斯分类器能够更好地处理现实世界数据中的复杂性和依赖性，从而增强其在实际应用中的表现。

to the antigen. The clone selection is corresponding to an afﬁnity

maturation process, which means immune individuals with high

afﬁnity will gradually increase during clone and mutation process.

At the same time, some immune individuals will polarize into

memory individuals.

Similar to the AIS, evolutionary algorithms (EAs), such as

Genetic Algorithms (GA) (Park & Ryu, 2010), Evolution Strategies

(ES) (Huang, Chang, Hsieh, & Sandnes, 2011) and Differential evo-

lution (DE) (Storn & Price, 1997) are all designed based on the basic

idea of biological evolution to control, and optimize artiﬁcial sys-

tems. Evolutionary computation shares many concepts of AIS like

a population, genotype phenotype mapping, and proliferation of

the most ﬁt. On the other hand, AIS models based on immune net-

works resemble the structures and interactions of neural network

models. The key advantages of AIS over neural networks are the

beneﬁts of a population of solutions and the evolutionary selection

pressure and mutation. Meanwhile, the underlying mechanisms

are fundamentally different in many aspects. First and foremost,

the immune system is highly distributed, highly adaptive, self-

organizing, maintains a memory of past encounters and has the

ability to continuously learn about new encounters. AIS is the sys-

tem developed around the current understanding of the immune

system. Second, AIS is a general framework for a distributed adap-

tive system and could, in principle, be applied to many domains.

Compared to most other evolutionary algorithms, AIS is much

more simple and straightforward to be implemented, which is

important for practitioners from other ﬁelds. In addition, because

AIS is self-organizing, it requires much less system parameters

than other evolutionary computation methods. Some works have

also pointed out the similarities and the differences between AIS

and other heuristics (Aickelin, Dasgupta, & Gu, 2013; Castro &

Timmis, 2002; Zheng, Chen, & Zhang, 2010).

In recent years, there has been considerable interests in explor-

ing and exploiting the potential of AIS for applications in computer

science and engineering including pattern recognition (Yuan,

Zhang, Zhao, Li, & Zhang, 2012), clustering (de Mello Honorio,

Leite da Silva, & Barbosa, 2012), optimization (Woldemariam &

Yen, 2010), and Remote Sensing (Zhong & Zhang, 2012). However,

the advantage of AIS for Bayesian classiﬁcation has received very

little attention. In this paper, we propose a new AIS based attribute

weighting method for Naive Bayes classiﬁcation. The great perfor-

mance of this design is validated through numerous performance

metrics, including classiﬁcation accuracy, class probability estima-

tion, and class ranking performance. It is worth noting that some

works exist to improve AIS for domain speciﬁc problems, such as

an improved Artiﬁcial Immune System for seeking the Pareto front

of land-use allocation problem in large areas (Huang, Liu, Li, Liang,

& He, 2013). However, in this paper, we do not consider the

improved AIS for WNB. This is mainly because that we aim at pro-

posing a self-adaptive attribute weighting framework based on the

immune system for WNB, and our designs can be easily general-

ized to any AIS based algorithms.

3. Preliminaries and problem deﬁnition

Given a training set D¼fx

; ...; x

g with N instances, each of

which contains n attribute values and a class label, we use

¼fx

i;1

; ...x

i;j

; ...x

i;n

; y

g to denote the ith instance x

in the data

set D. x

i;j

denotes the jth attribute value of x

and y

denotes the

class label of x

. The class space Y¼fc

; ...; c

g denotes

the set of labels that each instance belongs to and c

denotes the

kth label of the class space. For ease of understanding, we use

ðx

; y

Þ as a shorthand to represent an instance and its class label,

and use x

as a shorthand of x

. We also use a

as a shorthand to rep-

resent the jth attribute. For an instance (x

; y

) in the training set D,

its class label satisﬁes y

2Y, whereas a test instance x

only con-

tains attribute values and its class label y

needs to be predicted

by a weighted Naive Bayes classiﬁcation model, which can be for-

mally deﬁned as

cðx

Þ¼arg max

P ðc

j¼1

P ðx

t;j

Þj

ð1Þ

In Eq. (1), pðc

Þ represents the prior probability of class c

in the

whole training set. Pðx

t;j

j c

Þ. denotes the conditional probability

distribution of attribute x

t;j

conditioned by the given class c

. w

denotes the weight value of the jth attribute.

In this paper, we focus on the calculation of the conditional

probability pðx

i;j

j c

by ﬁnding optimal attribute weight values

; j ¼ 1; ...; n. While all existing attribute weighting approaches

deﬁne the weight without considering the uniqueness of the

underlying training data, we intend to resolve the optimal w value

selection problem as an optimization process. Assume that the cal-

culation of each conditional probability value pðx

i;j

j c

has an

optimal w

value, there are nw

vectors needed for NB classiﬁca-

tion. As a result, the WNB classiﬁcation can be transferred to an

optimization problem as follows.



¼ arg max

f ðx

; wÞ s:t: 0 6 w

6 1 ð2Þ

where w ¼fw

; ...; w

g denotes the attribute weight vector

for WNB. And f ðx

; wÞ is calculated by Eq. (1).

4. Self-adaptive attribute weighted Naive Bayes

4.1. AIS symbol deﬁnitions and overall framework

4.1.1. AIS symbol deﬁnitions

In this paper, we propose to use AIS to learn optimal attribute

weight values for NB classiﬁcation. In our solution, antigens in AIS-

WNB are simulated as training instances which are presented to

the system during the training process. Antibodies represent attri-

bute weight vector w with different set of values (i.e., candidates).

The binding of the antibodies and antigens will resemble the ﬁt-

ness of a speciﬁc weight vector with respect to the given training

data. This can be evaluated by using the afﬁnity score.

During the learning process, the antibodies with good afﬁnity

will experience a form of clonal expansion after being presented

with the training data sets (analogous to antigens). When antibod-

ies are cloned they will undergo a mutation process, in which a

speciﬁc mutation function will be designed (and deployed). The

evolving optimization process of the AIS system will help discover

optimal w vector with the best classiﬁcation performance.

Before introducing algorithm details, we brieﬂy deﬁne follow-

ing key notations, which will help understand the learning of the

weight values using AIS principle. In Table 1, we also summarize

Table 1

Symbol mapping between immune system and AISWNB.

Immune systems AISWNB

Antibody Attribute weight vector w

Antigens

Training instances in D

Shape-space Possible values of the data vectors

Afﬁnity The ﬁtness of the weight vector w on the testing

datasets

Clonal expansion Reproduction of weight vectors that are well matched

with antigens

Afﬁnity maturation Speciﬁc mutation of w vector and removal of lowest

stimulated weight vectors.

Immune memory Memory set of mutated weight vectors

1490 J. Wu et al. / Expert Systems with Applications 42 (2015) 1487–1502

剩余15页未读，继续阅读

weixin_38700790

粉丝: 5
资源: 953

自适应属性加权提升朴素贝叶斯分类性能

此代码可以用于实现地理加权回归分析；包括普通的地理加权回归和贝叶斯地理加权回归；包含算法代码和应用实例的代码.rar

局部加权朴素贝叶斯

朴素贝叶斯分类算法

云计算环境下分布式语义文本自适应分类方法.pdf

一种基于特征重要度的文本分类特征加权方法

网络流量分类国内外研究现状.docx

网络流量分类国内外研究现状.pdf

（论文和源码）基于DEAP的实时脑电情绪分类系统.zip

提升分类性能：ABTAdaBoost——自适应边界采样与数据清理的集成算法

自适应负载权重算法提升Web集群均衡性能

最新资源