嵌入式无监督特征选择：欧盟FS算法

需积分: 12 101 浏览量更新于2024-08-31 收藏 494KB PDF 举报

嵌入式无监督特征选择(Embedded Unsupervised Feature Selection, EUFS)是近年来在无监督特征选择领域的一个重要进展。传统上，由于缺乏标签信息，大多数无监督方法依赖于先通过聚类算法生成集群标签，然后将这些标签视为监督信息，再运用稀疏学习技术进行特征选择，如通过Lasso、Elastic Net等方法进行回归或分类问题中的特征选择。这种方法虽然在一定程度上结合了聚类和特征选择，但通常需要两个步骤，效率和性能可能受到限制。本文由Suhang Wang、Jiliang Tang和Huan Liu三位学者提出，他们来自美国亚利桑那州立大学计算、信息和决策系统工程学院。他们针对这一现状，提出了一个新颖的无监督特征选择算法——EUFS（Embedded Unsupervised Feature Selection）。与传统方法不同，EUFS直接将特征选择过程融入到聚类算法中，无需先进行标签生成，而是利用稀疏学习的原理，通过交替方向乘子法(Alternating Direction Method of Multipliers, ADMM)来解决优化问题。这样做的好处在于简化了流程，减少了中间环节的潜在误差，并且可能提高特征选择的精确性和效率。 EUFS的优势在于其端到端的学习过程，它能够在一次优化过程中同时完成聚类和特征筛选，这使得模型能够更好地捕捉数据中的内在结构，同时减少对假设性标签的依赖。实验结果显示，相比于基于标签的策略，EUFS在多种数据集上展示了更好的性能，尤其是在处理高维数据和噪声较多的情况下，其鲁棒性和效率具有明显优势。嵌入式无监督特征选择是一种创新的方法，它将无监督聚类和有监督特征选择的优势相结合，通过稀疏学习的数学框架，实现了在无标签数据环境中进行有效的特征选择。EUFS的提出为无监督机器学习提供了新的视角和实践工具，对于大数据分析和复杂系统建模等领域具有实际应用价值。在未来的研究中，这种无监督特征选择的理论和技术可能会得到进一步的发展和优化。

Embedded Unsupervised Feature Selection

Suhang Wang, Jiliang Tang, Huan Liu

School of Computing, Informatics, and Decision Systems Engineering

Arizona State University, USA

{suhang.wang, jiliang.tang, huan.liu}@asu.edu

Abstract

Sparse learning has been proven to be a powerful tech-

nique in supervised feature selection, which allows to

embed feature selection into the classiﬁcation (or re-

gression) problem. In recent years, increasing attention

has been on applying spare learning in unsupervised

feature selection. Due to the lack of label information,

the vast majority of these algorithms usually generate

cluster labels via clustering algorithms and then formu-

late unsupervised feature selection as sparse learning

based supervised feature selection with these generated

cluster labels. In this paper, we propose a novel unsuper-

vised feature selection algorithm EUFS, which directly

embeds feature selection into a clustering algorithm via

sparse learning without the transformation. The Alter-

nating Direction Method of Multipliers is used to ad-

dress the optimization problem of EUFS. Experimental

results on various benchmark datasets demonstrate the

effectiveness of the proposed framework EUFS.

Introduction

In many real-world applications such as data mining and

machine learning, one is often faced with high-dimensional

data (Jain and Zongker 1997; Guyon and Elisseeff 2003).

Data with high dimensionality not only signiﬁcantly in-

creases the time and memory requirements of the algo-

rithms, but also degenerates many algorithms’ performance

due to the curse of dimensionality and the existence of ir-

relevant, redundant and noisy dimensions(Liu and Motoda

2007). Feature selection, which reduces the dimensional-

ity by selecting a subset of most relevant features, has been

proven to be an effective and efﬁcient way to handle high-

dimensional data (John et al. 1994; Liu and Motoda 2007).

In terms of the label availability, feature selection methods

can be broadly classiﬁed into supervised methods and unsu-

pervised methods. The availability of the class label allows

supervised feature selection algorithms (Duda et al. 2001;

Nie et al. 2008; Zhao et al. 2010; Tang et al. 2014) to ef-

fectively select discriminative features to distinguish sam-

ples from different classes. Sparse learning has been proven

to be a powerful technique in supervised feature selection

(Nie et al. 2010; Gu and Han 2011; Tang and Liu 2012a),

 2015, Association for the Advancement of Artiﬁcial

which enables feature selection to be embedded in the clas-

siﬁcation (or regression) problem. As most data is unla-

beled and it is very expensive to label the data, unsuper-

vised feature selection attracts more and more attentions

in recent years (Wolf and Shashua 2005; He et al. 2005;

Boutsidis et al. 2009; Yang et al. 2011; Qian and Zhai 2013;

Alelyani et al. 2013).

Without label information to deﬁne feature relevance, a

number of alternative criteria have been proposed for un-

supervised feature selection. One commonly used criterion

is to select features that can preserve the data similarity or

manifold structure constructed from the whole feature space

(He et al. 2005; Zhao and Liu 2007). In recent years, apply-

ing sparse learning in unsupervised feature selection has at-

tracted increasing attention. These methods usually generate

cluster labels via clustering algorithms and then transform

unsupervised feature selection into sparse learning based su-

pervised feature selection with these generated cluster la-

bels such as Multi-cluster feature selection (MCFS) (Cai

et al. 2010), Nonnegative Discriminative Feature Selection

(NDFS) (Li et al. 2012), and Robust Unsupervised Feature

Selection (RUFS) (Qian and Zhai 2013).

In this paper, we propose a novel unsupervised feature

selection algorithm, i.e., Embedded Unsupervised Feature

Selection (EUFS). Unlike existing unsupervised feature se-

lection methods such as MCFS, NDFS or RUFS, which

transform unsupervised feature selection into sparse learn-

ing based supervised feature selection with cluster labels

generated by clustering algorithms, we directly embed fea-

ture selection into a clustering algorithm via sparse learning

without the transformation (see Figure 1). This work theoret-

ically extends the current state-of-the-art unsupervised fea-

ture selection, algorithmically expands the capability of un-

supervised feature selection, and empirically demonstrates

the efﬁcacy of the new algorithm. The major contributions

of this paper are summarized next.

• Providing a way to directly embed unsupervised feature

selection algorithm into a clustering algorithm via sparse

learning instead of transforming it into sparse learning

based supervised feature selection with cluster labels;

• Proposing an embedded feature selection framework

EUFS, which selects features in unsupervised scenarios

with sparse learning; and

下载后可阅读完整内容，剩余6页未读，立即下载

维纳斯的双臂

粉丝: 0
资源: 5

嵌入式无监督特征选择：欧盟FS算法

Embedded Systems Building Blocks.pdf

Mastering Embedded Linux Programming.pdf

嵌入式LINUX系统的QTEMBEDDED图形界面开发.PDF

Preprocessor dependency "sass-embedded" not found. Did you install it? Try `npm install -D sass-embedded`.

openembedded mozjs_60.9.0

java中 将byte[]类型数据转为pdf，并在指定坐标位置添加文字，要求使用com.itextpdf.text.pdf

vesa proposed embedded displayport v1.4b d3.pdf

[plugin:vite:css] Preprocessor dependency "sass-embedded" not found. Did you install it? Try `npm install -D sass-embedded`.

最新资源

java中将byte[]类型数据转为pdf，并在指定坐标位置添加文字，要求使用com.itextpdf.text.pdf