SODP-KSCE：一种结合监督正交判别投影的无监督聚类降维方法

83 浏览量更新于2024-08-27 收藏 197KB PDF 举报

"使用监督正交判别投影进行聚类的无监督维约化" 这篇研究论文探讨了一种新的无监督维约化方法，名为SODP-KSCE（Supervised Orthogonal Discriminant Projection for K-Means Selective Clustering Ensemble），它结合了监督正交判别投影（SODP）和K-均值聚类集成，以解决无监督聚类中的问题。该方法主要针对数据降维和聚类效果的优化。在传统的无监督聚类中，数据的结构和类别信息往往难以被有效利用。SODP-KSCE通过引入监督信息，即正交判别投影，能够在降维过程中保留更多的类别间差异，从而提高聚类质量。SODP是一种有效的监督降维技术，它基于线性判别分析的思想，但增加了正交性约束，以最大化不同类别的间隔，减少类内方差，增强类间方差。 SODP-KSCE算法以迭代方式进行工作，每次迭代都对聚类结果进行自适应优化。它通过K-均值聚类集成来增强聚类的稳定性和准确性。K-均值的集成学习策略能有效减少因随机初始化导致的不稳定性，提高聚类的一致性。此外，为了评估聚类性能，论文引入了一个新的指标——负熵增量（Negative Entropy Increment, NI）指数。这个指标能够量化聚类的纯度和分离度，有助于优化算法的参数选择。在实际应用中，SODP-KSCE首先在低维子空间中运行K-均值聚类集成，生成未标记数据的伪类标签。这些伪标签随后被用来指导原始空间中SODP的降维过程，形成一个反馈循环，使得降维与聚类相互促进，共同提升整体效果。实验部分，SODP-KSCE在多个标准数据集上进行了验证，结果表明该方法在保持数据分类性能的同时，能够有效地降低数据的维度，并且在聚类性能上优于其他传统的无监督降维方法。这证实了SODP-KSCE在无监督维约化和聚类中的有效性。 SODP-KSCE为无监督聚类提供了一种创新途径，通过融合监督信息和集成学习，提高了数据降维的性能和聚类的准确性，尤其适用于处理大型复杂数据集的情况。这一研究对于理解和改进无监督聚类方法，以及在实际数据分析任务中提升聚类效果，都具有重要的理论和实践价值。

Unsupervised Dimension Reduction Using Supervised Orthogonal Discriminant

Projection for Clustering

Leilei Yan

School of Computer Science and Technology

Soochow University

Suzhou, China

20184227032@stu.suda.edu.cn

Li Zhang

School of Computer Science and Technology

Soochow University

Suzhou, China

zhangliml@suda.edu.cn

Abstract—This paper proposes a novel unsupervised dimen-

sionality reduction method for clustering by combining super-

vised orthogonal discriminant projection (SODP) and K-means

selective clustering ensemble, called SODP-KSCE. The novel

algorithm, operating in an iterative manner, adaptively opti-

mizes the clustering results and learns a subspace with optimal

separation. To enhance the stability of K-means, SODP-KSCE

adopts ensemble learning. Moreover, a negentropy increment

(NI) index is introduced to measure the clustering performance.

The K-means clustering ensemble algorithm is performed in the

low-dimensional subspaces to generate pseudo class labels for

unlabeled data, which are then adopted to guide the dimension

reduction process of SODP in the original space. Experimental

results on multiple data sets indicate the effectiveness of SODP-

KSCE.

Keywords-dimensionality reduction; unsupervised learning;

clustering; supervised orthogonal discriminant projection; en-

semble learning;

I. INTRODUCTION

In many practical application domains, such as visual

category recognition, gene expression array analysis, and

image processing, a lot of high-dimensional data would be

produced. Especially for gene expression array analysis, the

dimensionality of data has reached thousands or even tens

of thousands. It is a very challenging issue to develop an

effective clustering method for high-dimensional data, due

to the curse of dimensionality [1].

For unlabeled high-dimensional data, the most common

way is to ﬁrst use unsupervised dimensionality reduction

techniques and then perform the data clustering in gener-

ated low-dimensional subspaces. Typical unsupervised di-

mensionality reduction methods include principal compo-

nent analysis (PCA) [2], [3], isometric feature mapping

(ISOMAP) [4], local linear embedding (LLE) [5], Laplacian

eigenmap (LE) [6], locality preserving projection (LPP) [7],

[8], unsupervised discriminant projection (UDP) [9] and

others. However, subspace has a great inﬂuence on the

clustering results, so it is great importance to choose a

subspace with the best separability. In this subspace, the

transformed data points belonging to the same cluster are

close to each other, and belonging to different cluster are

farther apart.

An effective approach for ﬁnding the most discriminant

subspace is to directly connect dimension reduction with

clustering [10]–[12]. An iterative feature and data clustering

(IFD) was proposed by mutually reinforcing the relation-

ships between the coefﬁcients and enabling a simultane-

ous clustering of both data and features [10]. Ding et

al. proposed adaptive dimension reduction for expectation

maximization (ADR-EM) and K-means (ADR-Km) methods

for clustering in low-dimension subspace [11]. Ding and

Li integrated linear discriminant analysis (LDA) and K-

means clustering in a joint framework to propose the LDA-

Km method [12]. LDA-Km uses ﬁrst K-means to generate

pseudo class labels and then LDA to perform dimension

reduction until convergence. In doing so, the separability of

the clusters is maximized in the low-dimensional subspace.

However, we found that LDA-Km has two drawbacks. As

we all know, K-means is sensitive to initial center points.

Therefore, the ﬁnal clustering result largely depends on the

selection of the initial center point, and the clustering result

is unstable. LDA-Km inherits this drawback of K-means.

The other drawback of LDA-Km is the issue of convergence.

Although the whole algorithm is a closed-loop system, there

is no direct relationship between the projection matrices

in the current and the previous iterations. It may happen

the situation of inﬁnite loops without setting maximum

iterations.

The goal of this paper is to improve the performance

of LDA-Km and generate a novel unsupervised dimen-

sionality reduction method for clustering. There are three

notable points in LDA-Km. First, LDA could be replaced

by other supervised dimensionality reduction method since

LDA requires the data to follow the Gaussian distribution,

which is not always satisﬁed in the real data. Thus, other

alike methods could be considered, such as marginal Fisher

analysis (MFA) [13] and supervised orthogonal discriminant

projection (SODP) [14]–[16]. SODP methods have three

versions, orthogonal discriminant projection [14], supervised

orthogonal discriminant projection [15], and Supervised

2239

2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th

International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems

DOI 10.1109/HPCC/SmartCity/DSS.2019.00311

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38591291

粉丝: 6
资源: 956

SODP-KSCE：一种结合监督正交判别投影的无监督聚类降维方法

监督式正交迹比判别投影在图像集人脸识别中的应用.pdf

基于核正交半监督鉴别分析的人脸识别算法.pdf

递归正交标签回归：半监督降维的框架

python实现：有一个csv文件，共4列，第一列为标签，2到4列为数据，确定根据2-4列数据进行无监督聚类，聚成3类，并将聚类结果进行三维可视化展示

自适应聚类和无监督聚类

使用python使用k-means聚类方法提取三维图像阈值

如果没有实际类别信息，该如何进行AP聚类无监督学习，如何将聚类结果打印出来

使用HDBSCAN对2维数据进行聚类

dbscan聚类算法聚类3维以上数组python代码

python无监督学习聚类

最新资源