邻域多颗粒融合的特征选择方法

89 浏览量更新于2024-08-29 收藏 712KB PDF 举报

"这篇研究论文探讨了一种新的特征选择方法，即通过邻域多颗粒融合进行特征选择。该方法利用了邻域粗糙集作为粒计算工具，并分析了邻域信息的粒度影响。作者首先对不同粒度下的特征重要性进行排名，然后通过融合这些排名来确定区分特征。这种方法为特征选择提供了多粒度视角，丰富了数据预处理的技术手段，尤其在数据挖掘、机器学习和粒计算领域具有重要意义。" 正文: 特征选择是数据预处理的关键步骤，它涉及从原始特征集合中挑选出对目标变量预测最有贡献的子集，以提高模型性能、降低过拟合风险和减少计算成本。传统的特征选择方法通常基于单粒度评估，但本研究论文提出了一种新颖的多粒度视角，利用邻域多颗粒融合策略来选取特征。论文首先引入了邻域粗糙集的概念。粗糙集理论是一种处理不完整或不确定信息的数学框架，它允许我们在不完全数据集上进行知识发现。邻域粗糙集则是在该理论基础上，通过考虑每个对象与其邻域内的其他对象之间的关系，来评估特征的重要性。这种方法能更全面地捕捉数据的局部结构，尤其对于非线性或复杂的数据模式更为有效。在论文中，作者深入分析了邻域信息粒度的影响。粒度决定了我们如何分割数据，不同的粒度可能导致不同的特征重要性评估。粒度细化会提供更精细的信息，而粒度粗化则可能揭示更高层次的模式。通过研究这些变化，可以更全面地理解特征在不同抽象层面上的作用。接下来，研究采用了多粒度方法，生成了基于不同邻域粒度的特征排名列表。每个列表反映了特定粒度下特征的区分能力。通过融合这些排名，论文提出了一种综合评价策略，可以捕捉到各粒度下的重要性信息，并最终确定最具区分性的特征子集。这种方法的优势在于其灵活性和适应性。它可以应对复杂数据集中的非线性和交互效应，同时考虑了局部和全局信息。此外，多粒度融合有助于避免因单一粒度评估导致的潜在偏见，从而提高特征选择的准确性和稳定性。论文最后可能涉及实验验证和对比分析，通过在各种数据集上应用提出的特征选择方法，与其他常见的特征选择技术进行比较，以证明其优越性。这样的实验结果通常包括准确率、召回率、F1分数等指标，以量化新方法在预测任务中的表现。这篇研究论文通过邻域多颗粒融合为特征选择提供了一个新的视角，丰富了粒计算在数据挖掘和机器学习领域的应用，有望推动特征选择技术的发展，并在实际问题中得到广泛应用。

Feature selection via neighborhood multi-granulation fusion

Yaojin Lin

⇑

, Jinjin Li

a,b

, Peirong Lin

, Guoping Lin

, Jinkun Chen

School of Computer Science, Minnan Normal University, Zhangzhou 363000, PR China

School of Mathematics and Statistics, Minnan Normal University, Zhangzhou 363000, PR China

article info

Article history:

Received 7 January 2014

Received in revised form 18 May 2014

Accepted 29 May 2014

Available online 10 June 2014

Keywords:

Granular computing

Feature selection

Multi-granulation

Neighborhood rough sets

Granularity inﬂuence

abstract

Feature selection is an important data preprocessing technique, and has been widely studied in data

mining, machine learning, and granular computing. However, very little research has considered a

multi-granulation perspective. In this paper, we present a new feature selection method that selects

distinguishing features by fusing neighborhood multi-granulation. We ﬁrst use neighborhood rough sets

as an effective granular computing tool, and analyze the inﬂuence of the granularity of neighborhood

information. Then, we obtain all feature rank lists based on the signiﬁcance of features in different

granularities. Finally, we obtain a new feature selection algorithm by fusing all individual feature rank

lists. Experimental results show that the proposed method can effectively select a discriminative feature

subset, and performs as well as or better than other popular feature selection algorithms in terms of

classiﬁcation performance.

1. Introduction

Many data mining and pattern recognition systems suffer from

the curse of dimensionality. This motivates the search for suitable

feature selection methods [7,9,15,24,29,46]. In practice, many

application ﬁelds, such as bio-informatics and text categorization,

involve databases in which both the number of rows (objects)

and columns (features) increase rapidly. The high-dimensional

nature of the data presents a challenge to learning algorithms. This

is because not all features can contribute to the discriminative

power, and the correlated features may bring many disadvantages

to traditional learning algorithms, such as low efﬁciency, over-ﬁt-

ting, and poor performance. To ease this problem, it is desirable

to reduce the high-dimensionality of data, as this enhances the

accuracy of pattern recognition and produces a more compact clas-

siﬁcation model with better generalization.

As we know, the feature selection technique plays a non-

trivial role in speeding up learning and improving classiﬁcation

performance [7,9,11,15,46]. To date, a number of feature selec-

tion algorithms have been developed for classiﬁcation learning.

The process of feature selection can be divided into two steps.

First, metrics such as mutual information [2,3,28], consistency

[4,12], dependency [9,10,48], and the classiﬁcation margin [33]

are used to evaluate the quality of candidate features. Second, a

search strategy is designed to solve the given optimization func-

tion. Feature selection may employ a heuristic search, genetic

optimization and greedy search, or another intelligent search

algorithm [34].

Although many algorithms have been developed for feature

selection, very little work has considered a multi-granulation view.

Granular computing, proposed by Zadeh [44], is an approximation

schema that can effectively solve a complex program at a certain

level or at multiple levels of granulation, and has attracted increas-

ing interest [26,27,38,43,45]. There are many representative gran-

ular computing models, such as rough sets [25], fuzzy sets [27,44],

probabilistic rough sets [41,42], covering rough sets [49], and

neighborhood rough sets [8–10,18,39]. Of these, neighborhood

rough sets provide an effective granular computing model for the

problem of heterogeneous feature subset selection, and have been

widely applied in cancer recognition, image annotation, and vibra-

tion diagnosis. For multi-granulation rough sets, neighborhood

rough sets compute the neighborhoods of samples from which to

extract information granularity (in this paper, neighborhood size

and granularity are equivalent terms), and different information

granularities can be induced by different neighborhood sizes

[16,17].

It has been shown that diverse results can be obtained from

different granular spaces for a given learning task. Indeed, given

the same set of objects, different granular spaces can provide com-

plementary predictive powers, and the prediction accuracy is sig-

niﬁcantly improved by combining their information [20–22,30].

The method of combining multiple granularities from different

http://dx.doi.org/10.1016/j.knosys.2014.05.019

⇑

Corresponding author. Tel.: +86 13960044089.

E-mail addresses: yjlin@mail.hfut.edu.cn (Y. Lin), jinjinli@mnnu.edu.cn (J. Li),

zzlprfj@163.com (P. Lin), gplin@163.com (G. Lin), cjk99@163.com (J. Chen).

Knowledge-Based Systems 67 (2014) 162–168

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38741101

粉丝: 6
资源: 926

邻域多颗粒融合的特征选择方法

MNMI.zip_样本选择_特征选择_近邻互信息_邻域 特征选择_邻域互信息

图像噪声复原技术：从滤波到融合

基于Java的家庭理财系统设计与开发-金融管理-家庭财产管理-实用性强

弹性盒子Flexbox布局.docx

网络财务系统 SSM毕业设计 附带论文.zip

联想电脑的bios设置

1_教务处关于云南师范大学2024年大学生科研训练基金项目立项申报工作的通知 (1).zip

基于Python实现的自然语言处理大作业-方面情感分析+源代码+文档说明+实验报告

基于Python的Web安全扫描软件设计与实现

【java毕业设计】教学质量评价系统源码（ssm+jsp+mysql+说明文档+LW）.zip

最新资源

MNMI.zip_样本选择_特征选择_近邻互信息_邻域特征选择_邻域互信息

网络财务系统 SSM毕业设计附带论文.zip