Embedded Unsupervised Feature Selection
Suhang Wang, Jiliang Tang, Huan Liu
School of Computing, Informatics, and Decision Systems Engineering
Arizona State University, USA
{suhang.wang, jiliang.tang, huan.liu}@asu.edu
Abstract
Sparse learning has been proven to be a powerful tech-
nique in supervised feature selection, which allows to
embed feature selection into the classification (or re-
gression) problem. In recent years, increasing attention
has been on applying spare learning in unsupervised
feature selection. Due to the lack of label information,
the vast majority of these algorithms usually generate
cluster labels via clustering algorithms and then formu-
late unsupervised feature selection as sparse learning
based supervised feature selection with these generated
cluster labels. In this paper, we propose a novel unsuper-
vised feature selection algorithm EUFS, which directly
embeds feature selection into a clustering algorithm via
sparse learning without the transformation. The Alter-
nating Direction Method of Multipliers is used to ad-
dress the optimization problem of EUFS. Experimental
results on various benchmark datasets demonstrate the
effectiveness of the proposed framework EUFS.
Introduction
In many real-world applications such as data mining and
machine learning, one is often faced with high-dimensional
data (Jain and Zongker 1997; Guyon and Elisseeff 2003).
Data with high dimensionality not only significantly in-
creases the time and memory requirements of the algo-
rithms, but also degenerates many algorithms’ performance
due to the curse of dimensionality and the existence of ir-
relevant, redundant and noisy dimensions(Liu and Motoda
2007). Feature selection, which reduces the dimensional-
ity by selecting a subset of most relevant features, has been
proven to be an effective and efficient way to handle high-
dimensional data (John et al. 1994; Liu and Motoda 2007).
In terms of the label availability, feature selection methods
can be broadly classified into supervised methods and unsu-
pervised methods. The availability of the class label allows
supervised feature selection algorithms (Duda et al. 2001;
Nie et al. 2008; Zhao et al. 2010; Tang et al. 2014) to ef-
fectively select discriminative features to distinguish sam-
ples from different classes. Sparse learning has been proven
to be a powerful technique in supervised feature selection
(Nie et al. 2010; Gu and Han 2011; Tang and Liu 2012a),
Copyright
c
2015, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
which enables feature selection to be embedded in the clas-
sification (or regression) problem. As most data is unla-
beled and it is very expensive to label the data, unsuper-
vised feature selection attracts more and more attentions
in recent years (Wolf and Shashua 2005; He et al. 2005;
Boutsidis et al. 2009; Yang et al. 2011; Qian and Zhai 2013;
Alelyani et al. 2013).
Without label information to define feature relevance, a
number of alternative criteria have been proposed for un-
supervised feature selection. One commonly used criterion
is to select features that can preserve the data similarity or
manifold structure constructed from the whole feature space
(He et al. 2005; Zhao and Liu 2007). In recent years, apply-
ing sparse learning in unsupervised feature selection has at-
tracted increasing attention. These methods usually generate
cluster labels via clustering algorithms and then transform
unsupervised feature selection into sparse learning based su-
pervised feature selection with these generated cluster la-
bels such as Multi-cluster feature selection (MCFS) (Cai
et al. 2010), Nonnegative Discriminative Feature Selection
(NDFS) (Li et al. 2012), and Robust Unsupervised Feature
Selection (RUFS) (Qian and Zhai 2013).
In this paper, we propose a novel unsupervised feature
selection algorithm, i.e., Embedded Unsupervised Feature
Selection (EUFS). Unlike existing unsupervised feature se-
lection methods such as MCFS, NDFS or RUFS, which
transform unsupervised feature selection into sparse learn-
ing based supervised feature selection with cluster labels
generated by clustering algorithms, we directly embed fea-
ture selection into a clustering algorithm via sparse learning
without the transformation (see Figure 1). This work theoret-
ically extends the current state-of-the-art unsupervised fea-
ture selection, algorithmically expands the capability of un-
supervised feature selection, and empirically demonstrates
the efficacy of the new algorithm. The major contributions
of this paper are summarized next.
• Providing a way to directly embed unsupervised feature
selection algorithm into a clustering algorithm via sparse
learning instead of transforming it into sparse learning
based supervised feature selection with cluster labels;
• Proposing an embedded feature selection framework
EUFS, which selects features in unsupervised scenarios
with sparse learning; and