人机协作下的不完全数据模糊聚类新方法

169 浏览量更新于2024-08-28 收藏 169KB PDF 举报

"人机协作的不完全数据模糊聚类算法" 在现代数据分析中，面对包含缺失值的数据集是常见的挑战。这些缺失值的存在往往会对聚类分析的结果产生显著影响，因此，有效地处理这些缺失信息成为了提升聚类性能的关键。本文提出的“人机协作的不完全数据模糊聚类算法”（ANovelFuzzyClusteringAlgorithmwithHuman-computerCooperationforIncompleteData）旨在解决这一问题。模糊C均值（FCM）聚类算法是一种广泛应用的聚类方法，它通过模糊数学中的隶属度函数来处理数据点与类别之间的关系。然而，对于不完全数据，传统的FCM算法可能无法提供准确的结果。为了解决这个问题，该论文提出了一种新算法，将人机协作引入到处理缺失数据的过程中。在处理缺失属性时，论文采用了基于最近邻规则的区间概念。这种方法考虑了缺失属性的不确定性，通过对每个缺失值设定一个范围，而不是一个单一的估计值，从而增加了处理不确定性的灵活性。具体来说，当处理缺失值时，算法首先利用最近邻规则来确定一个范围，然后基于这个范围和数据的其他信息进行优化。为了计算这些缺失属性，该算法结合了两种策略：最优完成策略（OptimalCompletionStrategy, OCS）和强制策略。OCS致力于找到一种方式来填充缺失值，使得整个数据集的平方误差最小化，而强制策略则可能根据特定领域的知识或用户的输入来约束填充过程。这两种策略的结合使得算法能够在保持数据完整性的同时，兼顾到人类专家的见解和知识。实验部分，论文使用了多个实际数据集来验证所提算法的性能。结果表明，与传统的处理缺失数据的方法相比，人机协作的模糊聚类算法能够提供更优的聚类结果，尤其是在处理复杂和不确定性的数据时。这进一步证明了该算法在不完全数据聚类领域的有效性。这项研究强调了在数据挖掘过程中人机交互的重要性，特别是在处理缺失数据的复杂性和不确定性方面。通过引入人机协作机制，可以实现更精确的数据恢复和聚类，这对于提高数据分析的质量和准确性具有重要意义。此外，这一方法也为未来的聚类算法设计提供了新的思路，即如何更好地融合人的智慧和机器的计算能力，以应对大数据时代的挑战。

A Novel Fuzzy Clustering Algorithm with

Human-computer Cooperation for Incomplete Data

Li Zhang

and Lu Wang

School of Information

Liaoning University

Shenyang, Liaoning Province, China

E-mail: zhang_li@lnu.edu.cn

Liyong Zhang

School of Control Science and Engineering

Dalian University of Technology

Dalian, Liaoning Province, China

E-mail: zhly@dlut.edu.cn

Abstract—Datasets with missing values are frequent in clustering

analysis. It seems obvious that the reconstruction of missing

attribute values can be considered as the key factors impacting

the clustering performance. For this, a FCM clustering

algorithm for incomplete data sets based on human-computer

cooperation is proposed in this paper. On account of the

uncertainty of missing attributes, intervals are introduced to the

missing attributes based on the nearest-neighbor rule.

Furthermore, the corresponding iterative solution approach is

developed for calculating the missing attributes based on the

optimal completion strategy (OCS) and compulsion strategy. The

experimental results of several data sets can demonstrate the

superiority of the proposed algorithm.

Keywords-Incomplete data; nearest-neighbor interval;

human-computer cooperation; optimal completion strategy; fuzzy

clustering

I. INTRODUCTION

With the advent of the era of Big Data, various algorithms

for data clustering have been into more and more application

fields [1,2,3]. The fuzzy c-means algorithm (FCM) [4] is

effective for separating complete data set between overlapping

clusters [5, 6]. However, the occurrence of incomplete data is a

common problem in practice. Even so, FCM can not be

applied to incomplete data sets clustering analysis directly.

In order to reduce the effects of the occurrence of

incomplete data for clustering, a variety of new approaches

have been proposed. Hathaway and Bezdek proposed four

specific strategies for the clustering of incomplete data [7].

The whole-data strategy (WDS) discards the data with missing

attribute values. The partial distance strategy (PDS) ignores all

missing attributes, and then calculates partial distances using

all available feature values, which was proposed by Dixon [8].

The optimal completion strategy (OCS) regarded the missing

attributes as the additional variables during each iteration. The

missing attributes are set as the corresponding attribute values

of the nearest prototype in nearest prototype strategy (NPS).

Besides, Lin and Su [9] adopted a meta-heuristic technique,

which combined Electromagnetism-like Mechanism with RBC.

Bing and Zhang[10] et al. proposed a hybrid fuzzy clustering

algorithm based on the PSO and the FCM for the incomplete

data clustering.

Interval numbers play an increasingly important role in the

clustering analysis[11,12]. In consideration of the uncertainty

of missing attributes, replacing missing attributes by intervals

can improve the robustness of the missing attribute estimation.

Wang et al. [13] adopted the nearest-neighbor rule to select the

training samples for missing attributes. Moreover, in order to

avoid the endpoints of intervals decided by different species

information, the NIR approach is developed by Zhang et al.

[14] for the interval estimation.

In this paper, a FCM clustering algorithm based on

human-computer cooperation strategy (HCFCM) to estimate

the missing attributes for incomplete data sets is proposed,

which solves the clustering analysis in two steps. Firstly,

according to partial Euclidean distance [8], the q nearest

neighboring points of the missing attribute can be selected,

whose mathematical expectation can be regarded as the ME

attribute value and whose maximum and minimum values can

be viewed as the upper and lower bounds of interval constraint.

Secondly, in consideration of optimal completion strategy

(OCS), the attribute value can be calculated iteratively along

with the memberships and cluster prototypes. To ensure that

the attributes always satisfy interval constraints during

iteration, a compulsion strategy based on nearest neighbor

expectations is proposed, which brings the characteristics of

human-computer cooperation to the algorithm.

II. T

HE OCS VERSION OF FCM

The OCS-FCM algorithm belongs to the imputation

method, which is proposed by Hathaway and Bezdek [7]. The

OCS-FCM algorithm regards the missing attributes

˅˄

the additional variables, and calculates them to complete the

missing part of the data set during each iteration. Let the

Lagrange function be

¦¦



ikM

vxuXVUJ

)

,,(

. (1)

with the constraint of:

nku

,,2,1,1

. (2)

The necessary condition for minimizing the objective

function (1) is the constant iterative process, where the

membership degrees

u , the cluster prototypes

v and the

missing values

are updated by

Projectsupported by the National Nature Science Foundation of China

(No.61174115, No. 61401061)

____________________________________



下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38738506

粉丝: 2

人机协作下的不完全数据模糊聚类新方法

动态模糊聚类算法

模糊均值聚类算法.zip_模糊聚类算法_模糊聚类预测_模糊预测_聚类算法 预测_聚类负荷预测

数据集不平衡不再是问题：YOLOv8评估指标挑战的应对之道

基于Springboot的健身房管理系统（有报告）。Javaee项目，springboot项目。

LabVIEW环境下DBC文件解析与可视化显示纯实现技术,LabVIEW平台下的DBC文件解析与可视化显示技术实现,dbc文件解析labview可以将CAN数据库dbc文件解析后可视化显示 纯lab

清华出品第一弹-DeepSeek从入门到精通.pdf

蓝桥杯Python解答.zip

(源码)基于MySQL binlog解析的Canal数据同步系统.zip

No.970：三菱PLC与组态王联动打造的智能污水处理系统,三菱PLC与组态王协同构建高效污水处理系统-No.970 智能控制解决方案,No.970 三菱PLC和组态王组态污水处理系统

mmexport1739792229691.mp4

最新资源

模糊均值聚类算法.zip_模糊聚类算法_模糊聚类预测_模糊预测_聚类算法预测_聚类负荷预测

LabVIEW环境下DBC文件解析与可视化显示纯实现技术,LabVIEW平台下的DBC文件解析与可视化显示技术实现,dbc文件解析labview可以将CAN数据库dbc文件解析后可视化显示纯lab