传递闭包与谱聚类结合的多中心模糊C均值算法探索

58 浏览量更新于2024-08-26 收藏 3.36MB PDF 举报

"该研究论文探讨了一种基于传递闭包和谱聚类的多中心模糊C均值（Fuzzy C-means, FCM）算法。文章由武汉理工大学、武汉纺织大学和华中科技大学的研究人员共同完成，发表在《应用软计算》期刊上。该算法旨在改进传统的模糊C均值聚类方法，解决其在处理复杂数据集时存在的问题，如单中心限制和对噪声数据的敏感性。" 正文: 模糊C均值（Fuzzy C-means）算法是数据挖掘和机器学习领域中广泛应用的一种聚类方法，因其能够处理具有模糊边界的样本而备受青睐。然而，传统的FCM算法存在一些局限性，比如它通常假设数据集只有一个中心，这可能无法准确反映实际中复杂的数据结构。此外，FCM对于异常值或噪声也比较敏感，这些因素都可能导致聚类结果的不理想。本研究论文提出了一种新的多中心模糊C均值算法，该算法结合了传递闭包和谱聚类的概念。传递闭包是一种在图论中用于分析节点间关系的方法，它可以捕捉数据之间的非对称性和传递性。在聚类场景中，传递闭包可以帮助识别样本之间的隐含关联，增强聚类的连通性和稳定性。另一方面，谱聚类是一种利用数据的相似性矩阵的谱特性进行聚类的技术，它能有效处理非凸形状的簇。通过将这两种方法整合到FCM算法中，研究者旨在克服传统FCM的单中心限制，同时利用谱聚类来改善对噪声和复杂结构的处理能力。论文详细介绍了算法的设计过程，包括如何构建传递闭包矩阵以及如何结合谱聚类来确定多中心。实验部分对比了新算法与传统FCM和其他聚类方法在多种数据集上的性能，验证了新算法在处理复杂数据结构时的优越性。关键词包括模糊C均值、多中心、格子相似性以及谱聚类，表明了该研究的主要关注点和理论基础。这篇研究论文为复杂数据集的聚类提供了一个新颖且强大的工具，通过结合不同的理论概念，提高了聚类的准确性和鲁棒性。这一方法不仅对数据挖掘领域有贡献，也为解决现实世界中的复杂问题提供了新的思路。

Zeng

al.

Applied

Soft

Computing

(2014)

89–101

Fig.

Schematic

diagram

the

non-convex

shaped

datasets

classiﬁcation.

(a)

Partitioning

the

dataset

FCM

algorithm

with

multi-centers;

(b)

the

relationship

graph

the

similarity

among

subclusters;

(c)

the

results

merging

the

subclusters.

(For

interpretation

the

references

color

this

text,

the

reader

referred

the

web

version

this

article.)

efﬁcient

clustering

criterion

the

normalized

cut

(Ncut)

[7]

which

deﬁned

Ncut(A,

assoc(A,

assoc(B,

(7)

where

assoc(A,



i∈A,j∈B

other

words,

Ncut

criterion

com-

putes

the

cut

cost

fraction

the

total

edge

connections

all

the

nodes

and

achieves

better

balance

the

cardinality

and

Thus

the

aim

minimize

Ncut.

Following

some

algebraic

for-

mulations,

turns

out

[8]

that

minimizing

normalized

cut

can

equivalently

recast

min

Tr(H

−

S)H)

s.t.

(8)

where

partition

∪

∩

,

and

,

c),

the

number

classes,

diagonal

matrix

with



the

identity

matrix,

∈



N×c

speciﬁc

discrete

matrix,

and

denotes

the

trace

matrix.

Unfortunately,

solving

the

above

discrete

optimization

problem

NP-hard.

make

tractable,

efﬁcient

relaxation

adopted

solve

real-valued

problem

instead

discrete-valued

one.

This

done

computing

the

ﬁrst

generalized

eigenvec-

tors

corresponding

the

smallest

eigenvalues,

the

generalized

eigenvalue

problem

−

S)z

Dz

(9)

where



the

eigenvalue

and

the

corresponding

eigenvec-

tor.

Furthermore,

−

called

the

normalized

Laplacian

matrix

which

symmetric

positive

semi-deﬁnite.

Finally,

fuzzy

C-means

method

performed

the

row

vectors

]

∈



N×k

obtain

the

clusters.

The

new

multi-center

FCM

algorithm

Many

datasets

are

non-convex

shaped

practical

clustering

problems.

showed

Fig.

1(a),

red

stars

and

green

stars

repre-

sent

samples

belonging

different

classes.

The

direct

application

the

FCM

algorithm

cannot

achieve

good

results,

because

the

FCM

algorithm

only

suitable

spherical

clusters,

cannot

recognize

irregular

elongated

shaped

clusters,

and

sensitive

the

ini-

tial

centers

clusters.

For

the

purpose

solving

these

problems,

propose

the

multi-center

FCM

algorithm

based

transitive

closure

and

spectral

clustering

(MFCM-TCSC).

This

method

divides

irregular

classes

into

lots

subclusters

(multi-cluster),

transforms

the

classiﬁcation

problem

into

problem

merging

subclusters.

There

need

consider

the

initial

number

clusters,

whether

the

distribution

datasets

spherical,

nor

the

local

optimum

problem

clustering.

3.1.

The

merge

criteria

subclusters

the

multi-center

fuzzy

C-means

clustering

order

recognize

the

irregular

classes,

one

class

repre-

sented

multiple

subclusters.

That

is,

partitioning

the

irregular

class

into

some

subclusters

(multi-clusters)

ﬁrst,

hence

the

classiﬁcation

problem

can

converted

into

subcluster

merg-

ing

problem.

showed

Fig.

1(a),

the

dataset

made

two

classes.

When

the

multi-center

FCM

algorithm

adopted,

the

dataset

clustered

into

subclusters.

The

ﬁrst

class

partitioned

into

subclusters,

and

the

second

partitioned

into

subclusters.

Merge

those

subclusters

and

get

the

results

showed

Fig.

1(c).

the

merging

the

subclusters

process,

two

criteria

are

acquired

from

analyzing

the

results

Fig.

(1)

the

subclusters

the

same

class

should

satisfy

the

similarity

and

nearness

relations

(such

the

subclusters

and

Fig.

1(a));

(2)

the

similarity

can

tran-

sitive

between

the

subclusters

the

same

class

(in

Fig.

1(a),

the

similarity

between

subcluster

and

subcluster

can

obtained

taking

the

maximal

similarity

value

through

the

path:

subcluster

→

subcluster

→

subcluster

→

subcluster

4).

this

paper,

the

number

subclusters

√

which

the

upper

limit

the

number

cluster

based

the

Literature

[5,14,15].

After

the

large

classes

non-spherical

datasets

are

divided

into

lots

subclusters

using

the

FCM

algorithm,

then

merge

the

subclusters

under

those

two

criteria.

3.2.

The

representation

the

subclusters’

features

After

the

multi-center

FCM

algorithm

clustering,

the

centers

and

the

fuzzy

membership

values

subclusters

will

derived.

The

representation

the

features

subclusters

can

studied

two

aspects:

the

centers

subclusters

and

the

information

fuzzy

membership

matrix.

(1)

Replace

subclusters

with

features

cen-

ters.

That

is,

the

features

the

center

∈

the

features

the

ith

subcluster.

the

sparsiﬁcation

samples,

which

can

reduce

the

computational

load

the

similarity

measure

and

the

spectral

clustering.

But

the

position

center

closely

the

quantity

samples

this

cluster

which

the

center

represents.

one

class

partitioned

sparsely

and

another

densely,

剩余12页未读，继续阅读

weixin_38654855

粉丝: 6
资源: 888

传递闭包与谱聚类结合的多中心模糊C均值算法探索

第二问第二小节模糊聚类.rar_2018华为杯研究生数学建模竞赛_传递闭包 聚类_模糊传递闭包_模糊聚类代码_聚类分析

模糊聚类分析传递闭包算法FCM的matlab程序，能对数据进行分类处理

一个模糊聚类分析算法的研究与实现

基于matlab的模糊聚类分析.pptx

模糊聚类分析的两种方法

Matlab笔记-模糊聚类分析原理及实现023.pdf

大学生身体素质数据的FCM算法聚类及MATLAB实现.pdf

模糊相似矩阵构建与聚类分析：基于格贴近度的 MATLAB 实例

MATLAB模糊聚类分析详解及应用

模糊聚类分析：E-学习者个性化分组研究

最新资源

第二问第二小节模糊聚类.rar_2018华为杯研究生数学建模竞赛_传递闭包聚类_模糊传递闭包_模糊聚类代码_聚类分析