LDM-RFE: 大型保证金分配机的递归特征消除提升分类效果

PDF格式 | 342KB | 更新于2024-08-27 | 167 浏览量 | 举报

本文主要探讨了"大型保证金分配机的递归特征消除"这一主题，它是基于2017年第四届国际系统与信息技术会议（ICSAI 2017）的研究论文。论文的作者是吉林大学计算机科学技术学院的GeOu和Yan Wang，以及英国阿伯丁大学计算科学系的Wei Pang和George Macleod Coghill。他们的研究成果关注的是在分类任务中如何有效剔除不相关的特征，以提高模型的性能。核心贡献是提出了一种新颖的特征选择算法，名为"大型边际分布机器递归特征消除"（LDM-RFE）。该算法利用大型边际分布机器（LDM），这是一种基于支持向量机的最新分类算法，来评估样本的所有特征。LDM-RFE通过递归地排除对分类性能影响较小的特征，生成一个按重要性排序的特征列表。这种策略有助于减少过拟合风险，提高模型的泛化能力。实验部分展示了LDM-RFE在与UCI基准数据集上的几种常见特征选择算法进行比较时，所展现出的显著优势。通过对比测试，LDM-RFE能够有效地提升模型的精度和效率，特别是在处理大规模和高维数据时，其性能尤为突出。关键词：特征选择、大型边际分布机、递归特征消除。这篇论文对于那些在机器学习和数据挖掘领域中寻求高效特征筛选方法的研究者来说，具有重要的参考价值，因为它提供了一种实用且有效的工具，用于优化模型构建过程，从而提升分类任务的准确性和稳定性。

展开

The 2017 4th International Conference on Systems and Informatics (ICSAI 2017)

Large Margin Distribution Machine

Recursive Feature Elimination

Ge Ou, Yan Wang

College of Computer Science and Technology

Jilin University

Changchun, China

*E-mail: wy6868@jlu.edu.cn

Wei Pang*, George Macleod Coghill

Department of Computing Science

University of Aberdeen

Aberdeen, UK

*E-mail: pang.wei@abdn.ac.uk

Abstract—In order to eliminate irrelevant features for

classification, we propose a novel feature selection algorithm

called Large Margin Distribution Machine Recursive Feature

Elimination (LDM-RFE). LDM-RFE uses the latest support

vector based classification algorithm Large Margin Distribution

Machine (LDM) to evaluate all the features of samples, and then

generates a ranked feature list during the procedure of Recursive

Feature Elimination (RFE). In the experiment section, we report

promising results obtained by LDM-RFE in comparison with

several common feature selection algorithms on five UCI

benchmark datasets.

Keywords-feature selection; large margin distribution machine;

recursive feature elimination; classification

NTRODUCTION

In classification, feature selection [1] is a very important

technique used to avoid overfitting and reduce computational

complexity [2]. There exist many feature selection algorithms

used for machine learning [3][4], however, many of them can

be used in all kinds of tasks and not specific for classification.

Some feature selection algorithms, such as Principal

Components Analysis (PCA) [5], t-test [6], and kullback-

Leibler divergence [7], can be used for any machine learning

models. But among these algorithms, Support Vector Machine

Recursive Feature Elimination (SVM-RFE) [8] is specifically

aimed to deal with classification tasks and it has better

performance than other commonly used feature selection

algorithms in many problems, especially for high-dimension

problems. Furthermore, some related feature selection

algorithms for classification has been proposed. Su and Hsiao

[9] proposed a Multiclass Mahalanobis-Tanguchi system for

feature selection and simultaneous multiclassification. Wang

[10] studied a feature selection algorithm for big data

problems. Liu [11] proposed a framework for multiclass

sentiment classification. In addition, the study of classification

model has made new progress over the last few years. Zhou

and Zhang [12] proposed Large Margin Distribution Machine

(LDM) algorithm, which has better classification performance

than Support Vector Machine (SVM) [13] in the tested

problems. LDM is based on the novel theory of optimizing the

margin distribution, and it used the dual coordinate descent

(DCD) [14] strategies and the averaged stochastic gradient

descent (ASGD) [15] strategies to solve the optimization

function.

Considering the above, in this research we propose a novel

RFE algorithm for classification based on LDM, which we call

Large Margin Distribution Machine Recursive Feature

Elimination (LDM-RFE). The proposed LDM-RFE ranks

problem features by their contributions to build the LDM

model at each iteration and eliminates irrelevant features

progressively. Our proposed LDM-RFE is compared with

several commonly used feature selection algorithms, such as t-

test, PCA, and SVM-RFE. The experimental results indicate

that our proposed LDM-RFE leads to better performance than

several other algorithms on five UCI [16] benchmark data sets.

II.

BACKGROUND

Let

={( , ),...,( , )}

Sxy xy

be a training set of

samples,

where

xR∈

are the input samples and

{1,1}

y =− +

is the

label set. The objective function in classification problems is

() ()

xwx

=⋅

, where

wR∈

, and

is the mapping function

induced by a kernel

, i.e.,

(, ) () ()

ij i j

xx x x

φφ

=⋅

, which

makes the data mapped to the feature space.

A. Large Margin Distribution Machine

Large Margin Distribution Machine (LDM) [12] [17] aim

to optimize the margin distribution, that is, maximize the

margin mean and minimize the margin variance at the same

time to build the model of classification and improve the

classification performance. Let

be a matrix whose element

()

, i.e.,

[]

(),...,( )

φφ

[]

,...,

Yy y=

is the label set.

Thus, the margin mean has the following form:

=(),

yw x w Y

⋅= ⋅



and the margin variance has the following form:

()

().

TT T T

nw XX w w XYY X w

wx ywx

−

=⋅−⋅



Inspired by maximizing the margin mean and minimizing the

margin variance simultaneously, the optimization problem of

LDM with soft-margin form is as follows:

下载后可阅读完整内容，剩余5页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38629939

粉丝: 11

LDM-RFE: 大型保证金分配机的递归特征消除提升分类效果

改进的多类支持向量机递归特征消除在癌症分类中的应用

递归特征消除法在SVM中的应用与实现

Python实现文法左递归消除方法详解

Tasking编译器高级用法：性能提升的7大黄金法则

广联达GBQ软件维护与优化：提升运行效率的黄金建议

【代码性能黄金法则】：KEIL时间管理与优化攻略

【提升光伏回归模型泛化能力】：金豺算法的创新实践

【GDSII文件优化秘籍】：缩减大小与加速处理的黄金法则

Python实现文法左递归消除工具

COMSOL激光增材制造技术：热流力三场耦合模型的构建与模拟研究,COMSOL激光增材制造中热-流-力三场耦合模型的研究与应用：基于固体传热、固体力学、层流和动网格技术的多物理场分析,comsol激光

最新资源