第
34
卷第
4
期
青岛科技大学学报(白然科学版)
Vol.
:让
X
o.4
2013
年
8
月
Journal
oI
Qingdao University
oI
Science and Technology( :'atural Science Edition) Aug. 2013
文章编号:
1672
6987(2013)04 0423
04
基于信息娟的粗糙集连续属性多变量离散化算法
王举范,陈卓
(青岛科技大学信息科学技术学院,山东青岛
.2660
口)
摘
要:属性离散化能够降低问题的复杂度,得到更加简短、精确且易于理解的规则。针
对现有离散化方法在选择断点时没有考虑属性问和属性内断点的互斥性且不能保证保持
决策表的不可分辨关系,本研究提出一种新的基于信息恼的粗桂集连续属性多变量离散
化算法
CPAD)
。它以信息焰作为选择断点的衡量标准,以不可分辨关系为停止标准并引
入
5
条断点预选确选策略。实验结果表明,引入断点预选、确选策略的
PAD
算法与
Ros
tta
软件中的
5
个离散化算法相比,具有较高的预测精度和较少的断点数目。
关键词:粗桂集;不可分辨关系;离散化;信息煽
中图分类号:
p
208
文献标志码:
A
Multiple
Variable
Discretization
Algorithm
of
Continuous
Attributes
in
Rough Set
Theory
Based
on
Information
Entropy
叭
TANG
Ju-fan,
CHEN
Zhuo
(College
of
Information
Science
and
Technology.
Qingdao
University
of
Science
and
Technology.
Qingdao
2660'12.
China)
Abstract:
Attribute
discretization
can
reduce
the
problem
complexity,
and
obtain
more
brief,
accurate
and
comprehensible
rules.
The
existing
discretization
methods
in
selec
ting
breakpoint
don
’t
take
into
consideration
of
the
mutual
exclusion
of
the
ones
among
and
within
the
attribut
凹,
therefore
cannot
maintain
the
indiscernibility
relation
of
deci
sion
table.
In
this
paper
a
new
multiple
variable
discretization
algorithm
is
proposed
for
continuous
且
t
tributes
in
rough
set
theory
based
on
information
entropy
(PAD).
The
new
algorithm
employs
information
entropy
as
a
measure
to
choo
;号
e
breakpoint,
takes
in
discernibility
relation
as
the
stopping
criterion
and
introduces
five
strategies
for
break
point
pre
selection
and
final
selection.
Experimental
results
show
that
PAD
algorithm
can
get
higher
precision
accuracy
and
less
breakpoint
number
compared
with
five
discret
ization
algorithms
employed
in
Rostta
software.
Key
words:
rough
sets;
indiscernibility;
discretization;
information
entropy
目前离散化受到了广泛的关注与研究,并取
得了丰硕的研究成果[
l
"]
0
离散化算法可分为有
监督离散化算法和|兀监督离散化算法。无监督离
散化算法不利用类别信息米提高算法的性能。这
类算法在选择断点时没有根据数据自身的特性合
收稿日期:
2012
09
26
基金项目:同家自然科学基金项
Fl
(
币
127'180).
作者简介.五举地(
1986
h
男,硕士研究牛、.
理地选取断点,而只是单纯的根据某个标准硬件
的对属性空间进行划分。因此,多种有监督的单
变量离散化算法已经被提出[
5-8
二。但是,单变量离
散化算法并没有考虑到属性|间的相互依赖关系,
只是从无监督离散化模式发展到孤立地考虑某个
第
34
卷第
4
期
青岛科技大学学报(白然科学版)
Vol.
:让
X
o.4
2013
年
8
月
Journal
oI
Qingdao University
oI
Science and Technology( :'atural Science Edition) Aug. 2013
文章编号:
1672
6987(2013)04 0423
04
基于信息娟的粗糙集连续属性多变量离散化算法
王举范,陈卓
(青岛科技大学信息科学技术学院,山东青岛
.2660
口)
摘
要:属性离散化能够降低问题的复杂度,得到更加简短、精确且易于理解的规则。针
对现有离散化方法在选择断点时没有考虑属性问和属性内断点的互斥性且不能保证保持
决策表的不可分辨关系,本研究提出一种新的基于信息恼的粗桂集连续属性多变量离散
化算法
CPAD)
。它以信息焰作为选择断点的衡量标准,以不可分辨关系为停止标准并引
入
5
条断点预选确选策略。实验结果表明,引入断点预选、确选策略的
PAD
算法与
Ros
tta
软件中的
5
个离散化算法相比,具有较高的预测精度和较少的断点数目。
关键词:粗桂集;不可分辨关系;离散化;信息煽
中图分类号:
p
208
文献标志码:
A
Multiple
Variable
Discretization
Algorithm
of
Continuous
Attributes
in
Rough Set
Theory
Based
on
Information
Entropy
叭
TANG
Ju-fan,
CHEN
Zhuo
(College
of
Information
Science
and
Technology.
Qingdao
University
of
Science
and
Technology.
Qingdao
2660'12.
China)
Abstract:
Attribute
discretization
can
reduce
the
problem
complexity,
and
obtain
more
brief,
accurate
and
comprehensible
rules.
The
existing
discretization
methods
in
selec
ting
breakpoint
don
’t
take
into
consideration
of
the
mutual
exclusion
of
the
ones
among
and
within
the
attribut
凹,
therefore
cannot
maintain
the
indiscernibility
relation
of
deci
sion
table.
In
this
paper
a
new
multiple
variable
discretization
algorithm
is
proposed
for
continuous
且
t
tributes
in
rough
set
theory
based
on
information
entropy
(PAD).
The
new
algorithm
employs
information
entropy
as
a
measure
to
choo
;号
e
breakpoint,
takes
in
discernibility
relation
as
the
stopping
criterion
and
introduces
five
strategies
for
break
point
pre
selection
and
final
selection.
Experimental
results
show
that
PAD
algorithm
can
get
higher
precision
accuracy
and
less
breakpoint
number
compared
with
five
discret
ization
algorithms
employed
in
Rostta
software.
Key
words:
rough
sets;
indiscernibility;
discretization;
information
entropy
目前离散化受到了广泛的关注与研究,并取
得了丰硕的研究成果[
l
"]
0
离散化算法可分为有
监督离散化算法和|兀监督离散化算法。无监督离
散化算法不利用类别信息米提高算法的性能。这
类算法在选择断点时没有根据数据自身的特性合
收稿日期:
2012
09
26
基金项目:同家自然科学基金项
Fl
(
币
127'180).
作者简介.五举地(
1986
h
男,硕士研究牛、.
理地选取断点,而只是单纯的根据某个标准硬件
的对属性空间进行划分。因此,多种有监督的单
变量离散化算法已经被提出[
5-8
二。但是,单变量离
散化算法并没有考虑到属性|间的相互依赖关系,
只是从无监督离散化模式发展到孤立地考虑某个