Knowledge-Based Systems 151 (2018) 16–23
Contents lists available at ScienceDirect
Knowle dge-Base d Systems
journal homepage: www.elsevier.com/locate/knosys
Attribute reduction based on max-decision neighborhood rough set
model
Xiaodong Fan
a , b
, Weida Zhao
a
, Changzhong Wang
a , b , ∗
, Yang Huang
a
a
Department of Mathematics, Bohai University, Jinzhou 121013, China
b
Key Laboratory of Digital Publishing Big Data Mining Governance and Presentation Technology Standard, Bohai University, Jinzhou 121013, China
a r t i c l e i n f o
Article history:
Received 19 November 2017
Revised 8 March 2018
Accepted 10 March 2018
Available online 11 March 2018
Keywords:
Attribute reduction
Neighborhood relation
Rough set
Heuristic algorithm
a b s t r a c t
The neighborhood rough set model only focuses on the consistent samples whose neighborhoods are
completely contained in some decision classes, and ignores the divisibility of the boundary samples
whose neighborhoods can not be contained in any decision classes. In this paper, we pay close atten-
tion to the boundary samples, and enlarge the positive region by adding the samples whose neighbor-
hoods have maximal intersection with some decision classes. Applying the mentioned idea, we introduce
a new neighborhood rough set model, named max-decision neighborhood rough set model. An attribute
reduction algorithm is designed based on the model. Both theoretical analysis and experimental results
show that the proposed algorithm is effective for removing most redundant attributes without loss of
classification accuracy.
©2018 Elsevier B.V. All rights reserved.
1. Introduction
Attribute reduction, also known as feature selection, is a pivotal
step of data preprocessing with wide applications to pattern recog-
nition and machine learning. It aims to reduce some attributes
which don’t change the classification of data. A smallest set of at-
tributes is ultimately obtained for data compression. Due to the
development of the internet, the scale of data becomes bigger and
bigger. Even thousands of attributes may be acquired in some real-
world databases. In order to shorten the processing time and ob-
tain better generalization, the attribute reduction problem attracts
more and more attention in the recent years [1–11] .
Rough set theory was proposed as a data analysis theory for
dealing with uncertain knowledge [12,13] . The classical rough set
theory is built on equivalence relations. The samples are grouped
into the equivalence classes (information granules) generated by
the equivalence relations. Applying the information granules, the
lower and upper approximations are constructed for attribute re-
duction. However, the classical rough set theory only works for dis-
crete data. For continuous data, the raw data must be discretized
first. Data discretization leads to a large amount of information
loss. As a result, the discretized data can not accurately reflect
the classified information. Therefore, the classical rough set the-
ory has been generalized from various aspects [14–60] , including
∗
Corresponding author.
E-mail address: changzhongwang@126.com (C. Wang).
neighborhood rough set models [28–42] , fuzzy rough set models
[43–46,51–60] , and so on.
The neighborhood rough set model was introduced to deal with
the reduction of numerical attributes. On the basis of the neighbor-
hood relations, the dependency degree has been defined to evalu-
ate the significance of attributes. And the attribute reduction algo-
rithms was designed. Kim [19] proposed a two-stage classification
method in which the data was classified by using the lower ap-
proximation at the first stage and then the non-classified data at
the first stage was classified by using the rough membership func-
tions obtained from the upper approximation set. Wu and Zhang
[28] investigated six classes of k-step neighborhood models as the
extensions of Pawlak’s rough set model. Hu et al. [29] defined the
positive region as the set of samples which can be classified with-
out uncertainty, constructed the dependency degree as the cardi-
nality ratio of the positive region to the sample space, and ap-
plied their dependency degree to reduce heterogeneous attributes,
namely, numerical and categorical attributes. Zhu and Hu [30] ex-
plored a multiple granularity neighborhood model, and proposed
a method for adaptively selecting a proper granularity by optimiz-
ing the margin distribution. Zhao et al. [31] designed an adaptive
neighborhood rough set model and developed a backtracking algo-
rithm for the cost-sensitive feature selection based on a trade-off
between
test costs and misclassification costs. Chen et al. [32] pro-
posed a gene selection method based on the neighborhood rough
set and the entropy measure for tackling the uncertainty and the
noise of the gene data.
https://doi.org/10.1016/j.knosys.2018.03.015
0950-7051/© 2018 Elsevier B.V. All rights reserved.