DSKmeans：一种新的判别子空间聚类方法

48 浏览量更新于2024-08-29 收藏 555KB PDF 举报

"DSKmeans: A New Kmeans-type Approach to Discriminative Subspace Clustering" 在数据挖掘领域，聚类是一种常见的无监督学习方法，K-means算法是其中最为人熟知且广泛应用的一种。然而，传统的K-means算法主要依赖于簇内的紧密性，即衡量簇内各点的分散程度，而较少考虑簇间的分离度，这在分类任务中是非常重要的。针对这一问题，"DSKmeans: A New Kmeans-type Approach to Discriminative Subspace Clustering"这篇研究论文提出了一种新的K-means类型的聚类方法——DSKmeans（Discriminative Subspace K-means），该方法结合了簇内的紧凑性和簇间的分离度，旨在实现更具有判别性的子空间聚类。 DSKmeans算法的核心思想是同时优化簇内的紧凑性和簇间的分离。它不仅仅关注降低簇内的方差，还强调提高不同簇之间的差异性，以增强聚类结果的可区分性。这在处理高维数据时尤其重要，因为高维数据往往包含冗余特征，而DSKmeans通过选择具有判别性的子空间，可以有效地降维并提高聚类效果。该论文中提到的关键技术包括： 1. **特征选择**：DSKmeans算法涉及特征选择过程，以找到对聚类最有区分力的特征子集。这有助于减少无关特征对聚类的影响，提高算法的效率和准确性。 2. **3阶张量**：论文可能使用3阶张量来表示和处理数据。张量是一种多维数组，可以更好地捕捉数据中的复杂结构和关系，特别是在处理时间序列数据或多模态数据时。 3. **子空间聚类**：DSKmeans算法工作在数据的低维子空间中，通过对原始高维数据进行投影，寻找最优的子空间以最大化类间距离和最小化类内距离。 4. **迭代过程**：与K-means类似，DSKmeans也采用迭代的方式来更新聚类中心和分配样本到相应的簇。在这个过程中，同时优化了簇的紧凑性和分离性。 5. **性能评估**：论文可能对DSKmeans进行了实验验证，比较了其与其他聚类算法（如传统的K-means）的性能，并使用了多种评价指标，如轮廓系数、Calinski-Harabasz指数等，以证明DSKmeans在保持聚类质量的同时，增强了聚类的判别性。 DSKmeans算法为解决传统K-means算法在判别性上的不足提供了一个新视角，通过结合簇内紧凑性和簇间分离，提高了聚类在分类任务中的表现。这种方法在数据挖掘、模式识别和机器学习等领域有着广泛的应用前景。

Short Communication

DSKmeans: A new kmeans-type approach to discriminative subspace

clustering

Xiaohui Huang

, Yunming Ye

⇑

, Huifeng Guo

, Yi Cai

, Haijun Zhang

, Yan Li

Shenzhen Key Laboratory of Internet Information Collaboration, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, China

School of Software Engineering, South China University of Technology, Guangzhou, China

Shenzhen Polytechnic, Liuxian Road, Shenzhen 518055, China

article info

Article history:

Received 23 February 2014

Received in revised form 15 June 2014

Accepted 15 July 2014

Available online 27 July 2014

Keywords:

Kmeans clustering

Feature selection

3-Order tensor

Data mining

Subspace clustering

abstract

Most of kmeans-type clustering algorithms rely on only intra-cluster compactness, i.e. the dispersions of

a cluster. Inter-cluster separation which is widely used in classiﬁcation algorithms, however, is rarely

considered in a clustering process. In this paper, we present a new discriminative subspace kmeans-type

clustering algorithm (DSKmeans), which integrates the intra-cluster compactness and the inter-cluster

separation simultaneously. Different to traditional weighting kmeans-type algorithms, a 3-order tensor

is constructed to evaluate the importance of different features in order to integrate the aforementioned

two types of information. First, a new objective function for clustering is designed. To optimize the objec-

tive function, the corresponding updating rules for the algorithm are then derived analytically. The prop-

erties and performance of DSKmeans are investigated on several numerical and categorical data sets.

Experimental results corroborate that our proposed algorithm outperforms the state-of-the-art

kmeans-type clustering algorithms with respects to four metrics: Accuracy, RandIndex, Fscore and Nor-

mal Mutual Information(NMI).

1. Introduction

Clustering techniques have been used extensively in many

ﬁelds in nature [1], such as bioinformatics [2], text organizations

[3], and community detection [4], to name just a few. Clustering

is an unsupervised classiﬁcation technique that aims at partition-

ing a data set into clusters such that the objects within a cluster

are similar and the objects in different clusters are dissimilar

according to certain pre-deﬁned criteria [5].

The clustering algorithms [6] can be summarized as partition-

ing methods, hierarchical methods, density-based methods, grid-

based methods and model-based methods, etc. The kmeans-type

clustering algorithm is a widely used partitioning methods in

many real-life applications. Many researchers extended the

kmeans algorithms by different types of weighting ways. From

the weighting ways, existing kmeans-type algorithms can be clas-

siﬁed into three categories: (1) No weighting kmeans-type algo-

rithms [7–10], which treat all features equally in the process of

minimizing the dispersions of clusters. Different features, however,

have different discriminative capabilities in real-world applica-

tions. Therefore, different types of feature selection and weighting

methods have been proposed in many clustering processes. (2)

Vector weighting kmeans-type algorithms, which have been

reported in [5,11–15]. (3) Matrix weighting kmeans-type algo-

rithms, the examples of which are proposed in [16–21,3,22,23].

Most of these weighting kmeans-type clustering algorithms only

consider that the objects in the same cluster are similar, i.e. mini-

mizing the dispersions of all the clusters, in a way that the features

are weighted by using different methods.

However, a feature in a cluster may have different discrimina-

tive capabilities when we compare this cluster with other clusters.

For example, there are three clusters (C1, C2 and C3) in Fig. 1 (the

distributions of features are listed in the table). W

and W

are

weighting vectors when we compare cluster 1 (C1) with cluster 2

(C2) and cluster 3 (C3), respectively. We can observe that the fea-

tures ‘‘Olympic, sport, chaos, riots’’ have more discriminative capa-

bilities when comparing C1 with C2. In contrast, comparing C1–C3,

the features ‘‘London, England, Beijing, China’’ have more discrim-

inative capabilities. The same features in C1‘‘London, England’’

have different discriminative capabilities when comparing to dif-

ferent clusters, i.e. ‘‘London, England’’ have less discriminative

capabilities in distinguishing C1 and C2, while they have more dis-

criminative capabilities in identifying C1 and C3.

Motivated by the example in Fig. 1, we propose a new kmeans-

type algorithm by integrating the intra-cluster compactness and

the inter-cluster separation with a 3-order tensor weighting

http://dx.doi.org/10.1016/j.knosys.2014.07.009

⇑

Corresponding author.

Knowledge-Based Systems 70 (2014) 293–300

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38574132

粉丝: 7

DSKmeans：一种新的判别子空间聚类方法

python-kmeans-dominant-colors:使用聚类在图片中找到主色

AI-in-Marketing-KMeans-Clustering:营销中的AI-KMeans聚类

Kmeans---Machine-Learning:用于数据点聚类或分类的 Kmeans 算法的实现。 数据点应该在 input.txt 文件中，x 和 y 坐标在一行中

kmeans-data-mining:用于评论的 k-means 聚类算法的 Python 实现

KMeans-Clustering-Iris-Dataset:使用Iris数据集的KMeans聚类

kmeans-pyspark:Spark中分布式K-means聚类的Python实现

kmeans-ML:K均值聚类算法的数学表示

k-means-constrained:K均值聚类-受最小和最大聚类大小限制

kmeans-and-spectral：使用K-means算法和Spectral Clusting算法对玩具数据集进行聚类

Image-Quantization-Using-KMeans-Clustering

最新资源

Kmeans---Machine-Learning:用于数据点聚类或分类的 Kmeans 算法的实现。数据点应该在 input.txt 文件中，x 和 y 坐标在一行中