谱正则化非线性判别聚类：低维表示与高效高维数据簇合

41 浏览量更新于2024-08-26 1 收藏 571KB PDF 举报

本文主要探讨了"基于谱正则化的非线性判别聚类"这一主题，针对直接处理高维稀疏数据在聚类上的挑战。在现代数据分析中，由于数据维度的增长，直接进行高维聚类往往效率低下且效果不佳。为解决这个问题，研究者提出了一个创新的方法，旨在通过降维技术将高维数据转化为低维紧凑表示，从而提高聚类性能。传统的降维方法，如线性判别分析（Linear Discriminant Analysis, LDA）和局部线性嵌入（Local Linear Embedding, LLE），主要关注于分类任务和恢复数据的几何结构，而非直接服务于聚类目的。因此，新提出的算法着重于结合谱分析（Spectral Analysis）和正则化技术，以实现非线性降维的同时，保留数据的内在结构和增强原始数据的集群结构。核心思路是将低维坐标表示为数据流形上的预设平滑向量的线性组合，这些向量由加权图决定。关键在于通过一个优化过程，即最大化群组间差异与总体差异之间的比率，同时保持群组分配矩阵对数据流形的平滑度，来找到最佳的组合系数和集群分配矩阵。这个过程通过迭代方法求解，并被证明具有收敛性。实验部分展示了该方法在UCI数据集和实际应用中的有效性。它不仅能有效地对高维数据进行聚类，而且还能生成清晰的可视化结果，这对于理解复杂的数据分布和发现潜在的模式至关重要。基于谱正则化的非线性判别聚类算法为高维数据的分析提供了一个强大且灵活的工具，尤其适用于那些希望兼顾降维、保持结构和实现聚类任务的研究者和实践者。

ORIGINAL ARTICLE

Nonlinear discriminant clustering based on spectral

regularization

Yubin Zhan

•

Jianping Yin

•

Xinwang Liu

Received: 20 April 2011 / Accepted: 24 March 2012 / Published online: 19 April 2012

Ó Springer-Verlag London Limited 2012

Abstract Owing to sparseness, directly clustering high-

dimensional data is still a challenge problem. Therefore,

obtaining their low-dimensional compact representation by

dimensional reduction is an effective method for clustering

high-dimensional data. Most of existing dimensionality

reduction methods, however, are developed originally for

classiﬁcation (such as Linear Discriminant Analysis) or

recovering the geometric structure (known as manifold) of

high-dimensional data (such as Locally Linear Embedding)

rather than clustering purpose. Hence, a novel nonlinear

discriminant clustering by dimensional reduction based on

spectral regularization is proposed. The contributions of the

proposed method are two folds: (1) it can obtain nonlinear

low-dimensional representation that can recover the

intrinsic manifold structure as well as enhance the cluster

structure of the original high-dimensional data; (2) the

clustering results can also be obtained in the dimensionality

reduction procedure. Firstly, the desired low-dimensional

coordinates are represented as linear combinations of pre-

deﬁned smooth vectors with respect to the data manifold,

which are characterized by a weighted graph. Then, the

optimal combination coefﬁcients and the optimal cluster

assignment matrix are computed by maximizing the ratio

between the between-cluster scatter and the total scatter

simultaneously as well as preserving the smoothness of the

cluster assignment matrix with respect to the data mani-

fold. Finally, the optimization problem is solved in an

iterative procedure, which is proved to be convergent.

Experiments on UCI data sets and real world data sets

demonstrated the effectiveness of the proposed method for

both clustering and visualization high-dimensional data set.

Keywords Dimensionality reduction  Laplacian graph 

Spectral regularization  Cluster structure

1 Introduction

Real applications in many domains such as pattern recogni-

tion, computer vision, and data mining often lead to very high-

dimensional data. For example, in appearance-based face

recognition, a face image with 64 9 64 pixels is often repre-

sented as a 4096-dimensional vector. Due to sparsity of data in

high-dimensional space, clustering directly on high-dimen-

sional data is still a challenging problem. A natural solution is

to project data into a low-dimensional compact space through

dimensionality reduction method such as PCA before clus-

tering [7]. However, due to the intrinsic gap between clus-

tering and existing dimensionality reduction methods, which

are not designed originally for clustering, clustering structure

of original data cannot be well preserved and may be even

destroyed in the transformed low-dimensional space.

In machine learning and data mining community,

dimensionality reduction is an effective and widely used

approach to deal with high-dimensional data, due to its

potential of mitigating the so-called ‘‘curse of dimensional-

ity’’. Up to now, many dimensionality reduction methods

have been developed. Since in classical clustering task, there

is no any supervised information provided. Here, we only

focus on unsupervised dimensionality reduction methods.

Y. Zhan (&)  J. Yin  X. Liu

School of Computer, National University of Defense

Technology, Changsha 410073, Hunan,

People’s Republic of China

e-mail: yubinzhan@nudt.edu.cn

J. Yin

e-mail: JPYin@nudt.edu.cn

X. Liu

e-mail: XWLiu@nudt.edu.cn

123

Neural Comput & Applic (2013) 22:1599–1608

DOI 10.1007/s00521-012-0929-y

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38699726

粉丝: 5
资源: 927

谱正则化非线性判别聚类：低维表示与高效高维数据簇合

回归、分类与聚类：三大方向剖解机器学习算法的优缺点

机器学习面精彩试题目.pdf

跨标签抑制：具有组正则化的判别性和快速词典学习

图像表示的稀疏对偶正则化概念分解

中文文本预处理；k-means聚类

流形正则化视角下的降维理论解析

DPS教程：多元统计详解，涵盖回归、聚类与判别分析

正则化在文本挖掘中的应用：高维稀疏数据处理秘籍

图像处理中的正则化应用：过拟合预防与泛化能力提升策略

MATLAB矩阵正则化：解决病态问题和提升模型稳定性，3种常见方法

最新资源