最大决策邻域粗糙集模型在属性约简中的应用

27 浏览量更新于2024-08-26 1 收藏 1.06MB PDF 举报

"基于最大决策邻域粗糙集模型的属性约简" 在信息技术和数据挖掘领域，粗糙集理论是一种强大的工具，用于处理不确定性和不完整性数据。本文关注的问题是属性约简，这是粗糙集理论中的核心概念，旨在减少数据集中的冗余属性，同时保持数据的分类能力。传统邻域粗糙集模型主要处理一致性样本，即那些可以明确归类到某一决策类中的样本，但忽略了边界样本，这些样本的邻域可能不属于任何决策类。最大决策邻域粗糙集模型（Max-Decision Neighborhood Rough Set Model）是对此问题的一个新尝试。该模型通过考虑边界样本并扩大正区域来改进传统的邻域粗糙集模型。正区域指的是与某一决策类交集最大的样本集。这一扩展使得模型能更好地利用边界样本的可分性信息，提高模型的泛化能力和分类效果。作者们提出了一种基于最大决策邻域粗糙集模型的属性约简算法。属性约简的目标是找到一个最小的属性子集，这个子集依然能保持原始数据的分类能力。他们设计的启发式算法旨在有效地去除那些对分类无贡献或者贡献较小的冗余属性，以减少计算复杂度和提高模型的解释性。实验结果显示，该算法在去除冗余属性的同时，保持了原有的分类精度，证明了其在属性约简方面的有效性。关键词包括属性约简、邻域关系、粗糙集以及启发式算法，这些关键词揭示了研究的核心内容。属性约简是数据预处理的关键步骤，可以帮助我们理解和简化复杂的数据结构；邻域关系是粗糙集理论中的基本概念，用于描述数据点之间的关联性；粗糙集则是一种处理不完全信息系统的理论框架；启发式算法是寻找最优解的策略，通常在复杂问题中提供有效的近似解。这项工作为粗糙集理论在属性约简问题上的应用提供了新的视角，通过最大决策邻域的概念改进了属性约简的效率和精确度，为处理有噪声和不确定性的大数据集提供了有价值的工具。这对于数据挖掘、知识发现以及机器学习等领域具有重要的理论和实践意义。

Knowledge-Based Systems 151 (2018) 16–23

Contents lists available at ScienceDirect

Knowle dge-Base d Systems

journal homepage: www.elsevier.com/locate/knosys

Attribute reduction based on max-decision neighborhood rough set

model

Xiaodong Fan

a , b

, Weida Zhao

, Changzhong Wang

a , b , ∗

, Yang Huang

Department of Mathematics, Bohai University, Jinzhou 121013, China

Key Laboratory of Digital Publishing Big Data Mining Governance and Presentation Technology Standard, Bohai University, Jinzhou 121013, China

a r t i c l e i n f o

Article history:

Received 19 November 2017

Revised 8 March 2018

Accepted 10 March 2018

Available online 11 March 2018

Keywords:

Attribute reduction

Neighborhood relation

Rough set

Heuristic algorithm

a b s t r a c t

The neighborhood rough set model only focuses on the consistent samples whose neighborhoods are

completely contained in some decision classes, and ignores the divisibility of the boundary samples

whose neighborhoods can not be contained in any decision classes. In this paper, we pay close atten-

tion to the boundary samples, and enlarge the positive region by adding the samples whose neighbor-

hoods have maximal intersection with some decision classes. Applying the mentioned idea, we introduce

a new neighborhood rough set model, named max-decision neighborhood rough set model. An attribute

reduction algorithm is designed based on the model. Both theoretical analysis and experimental results

show that the proposed algorithm is effective for removing most redundant attributes without loss of

classiﬁcation accuracy.

1. Introduction

Attribute reduction, also known as feature selection, is a pivotal

step of data preprocessing with wide applications to pattern recog-

nition and machine learning. It aims to reduce some attributes

which don’t change the classiﬁcation of data. A smallest set of at-

tributes is ultimately obtained for data compression. Due to the

development of the internet, the scale of data becomes bigger and

bigger. Even thousands of attributes may be acquired in some real-

world databases. In order to shorten the processing time and ob-

tain better generalization, the attribute reduction problem attracts

more and more attention in the recent years [1–11] .

Rough set theory was proposed as a data analysis theory for

dealing with uncertain knowledge [12,13] . The classical rough set

theory is built on equivalence relations. The samples are grouped

into the equivalence classes (information granules) generated by

the equivalence relations. Applying the information granules, the

lower and upper approximations are constructed for attribute re-

duction. However, the classical rough set theory only works for dis-

crete data. For continuous data, the raw data must be discretized

ﬁrst. Data discretization leads to a large amount of information

loss. As a result, the discretized data can not accurately reﬂect

the classiﬁed information. Therefore, the classical rough set the-

ory has been generalized from various aspects [14–60] , including

∗

Corresponding author.

E-mail address: changzhongwang@126.com (C. Wang).

neighborhood rough set models [28–42] , fuzzy rough set models

[43–46,51–60] , and so on.

The neighborhood rough set model was introduced to deal with

the reduction of numerical attributes. On the basis of the neighbor-

hood relations, the dependency degree has been deﬁned to evalu-

ate the signiﬁcance of attributes. And the attribute reduction algo-

rithms was designed. Kim [19] proposed a two-stage classiﬁcation

method in which the data was classiﬁed by using the lower ap-

proximation at the ﬁrst stage and then the non-classiﬁed data at

the ﬁrst stage was classiﬁed by using the rough membership func-

tions obtained from the upper approximation set. Wu and Zhang

[28] investigated six classes of k-step neighborhood models as the

extensions of Pawlak’s rough set model. Hu et al. [29] deﬁned the

positive region as the set of samples which can be classiﬁed with-

out uncertainty, constructed the dependency degree as the cardi-

nality ratio of the positive region to the sample space, and ap-

plied their dependency degree to reduce heterogeneous attributes,

namely, numerical and categorical attributes. Zhu and Hu [30] ex-

plored a multiple granularity neighborhood model, and proposed

a method for adaptively selecting a proper granularity by optimiz-

ing the margin distribution. Zhao et al. [31] designed an adaptive

neighborhood rough set model and developed a backtracking algo-

rithm for the cost-sensitive feature selection based on a trade-off

between

test costs and misclassiﬁcation costs. Chen et al. [32] pro-

posed a gene selection method based on the neighborhood rough

set and the entropy measure for tackling the uncertainty and the

noise of the gene data.

https://doi.org/10.1016/j.knosys.2018.03.015

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38686399

粉丝: 9
资源: 934

最大决策邻域粗糙集模型在属性约简中的应用

12322879_邻域粗糙集_邻域属性约简_粗糙集_

邻域粗糙集属性约简,粗糙集属性约简步骤,Python

图像矩阵matlab代码-scikit-roughsets:基于粗糙集的约简算法的Python实现

邻域粗糙集属性约简matlab

邻域粗糙集的属性约简python代码

邻域粗糙集属性约简py代码

java 属性约简,基于邻域组合熵的属性约简算法.PDF

邻域粗糙集python代码实例

python 邻域粗糙集

前向贪婪搜索策略的邻域粗糙集的特征约简的python实现

最新资源