知识粒化层次聚类集成模型在无监督学习中的应用

19 浏览量更新于2024-08-26 收藏 483KB PDF 举报

"这篇研究论文探讨了一种基于知识粒化的层次聚类集成模型，旨在解决无监督分类学习中的复杂性问题，如不确定性、模糊性和重叠。文章指出，聚类集成方法通过生成多种不同的聚类解决方案并将其整合成最终决策，已被证明在未标记数据的分类中非常有效。然而，由于基础聚类结果的内在复杂性，这项任务变得更加困难。论文引入了粒计算和粗糙集理论来应对这些挑战。" 正文: 基于知识粒化的层次聚类集成模型是当前数据挖掘和机器学习领域的一个重要研究方向。聚类分析是无监督学习的关键技术，用于发现数据集中的自然群体结构，而聚类集成则进一步提高了聚类的稳定性和准确性。然而，基础聚类结果通常存在不确定性、模糊性和重叠等问题，这给集成过程带来了困难。论文中提到的知识粒化（Knowledge Granulation）是粒计算（Granular Computing）的核心概念，它通过将数据细化为更抽象或具体的粒度，以处理信息的不完整性和不确定性。粒度可以理解为数据的抽象级别，允许我们从不同层面理解和操作数据。在聚类中，粒化可以帮助减少噪声，暴露更清晰的模式，并降低复杂性。粗糙集理论（Rough Sets Theory）是一种处理不完全信息系统的数学工具，它允许对不确定和不精确的数据进行操作。在聚类集成中，粗糙集理论可以用来处理模糊边界和重叠的聚类，通过揭示数据的近似性质，帮助构建更稳健的聚类模型。该研究提出的层次聚类集成模型结合了这两种理论，通过知识粒化形成不同粒度的聚类，然后利用粗糙集理论处理聚类结果的不确定性，从而创建一个多层次的集成框架。这种方法不仅能够融合多样性的聚类结果，还能减少噪声，提高聚类的鲁棒性。论文的贡献在于提供了一个新的视角来解决聚类集成中的复杂性问题，通过粒化和粗糙集的结合，提高了集成模型的性能。这种方法对于大数据分析、模式识别和其他依赖于无监督学习的应用具有潜在价值，能够更好地应对现实世界中数据的复杂性和不确定性。这篇研究论文展示了知识粒化和粗糙集理论在构建层次聚类集成模型中的创新应用，为改进无监督学习提供了新的思路和方法。通过深入理解和应用这些理论，可以进一步提升数据聚类的准确性和可靠性，为未来的研究和实践开辟新的道路。

ARTICLE IN PRESS

JID: KNOSYS [m5G;October 27, 2015;20:16]

Knowledge-Based Systems 000 (2015) 1–10

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

Hierarchical cluster ensemble model based on knowledge granulation

Jie Hu

, Tianrui Li

a,∗

, Hongjun Wang

, Hamido Fujita

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China

Faculty of Software and Information Science, Iwate Prefectural University, 020-0693, Iwate, Japan

article info

Article history:

Received 26 July 2015

Revised 14 September 2015

Accepted 3 October 2015

Available online xxx

Keywords:

Cluster ensemble

Granular computing

Rough sets

abstract

Cluster ensemble has been shown to be very effective in unsupervised classiﬁcation learning by generating

a large pool of different clustering solutions and then combining them into a ﬁnal decision. However, the

task of it becomes more diﬃcult due to the inherent complexities among base cluster results, such as uncer-

tainty, vagueness and overlapping. Granular computing is one of the fastest growing information-processing

paradigms in the domain of computational intelligence and human-centric systems. As the core part of gran-

ular computing, the rough set theory dealing with inexact, uncertain, or vague information, has been widely

applied in machine learning and knowledge discovery related areas in recent years. From these perspectives,

in this paper, a hierarchical cluster ensemble model based on knowledge granulation is proposed with the

attempt to provide a new way to deal with the cluster ensemble problem together with ensemble learning

application of the knowledge granulation. A novel rough distance is introduced to measure the dissimilarity

between base partitions and the notion of knowledge granulation is improved to measure the agglomeration

degree of a given granule. Furthermore, a novel objective function for cluster ensembles is deﬁned and the

corresponding inferences are made. A hierarchical cluster ensemble algorithm based on knowledge granula-

tion is designed. Experimental results on real-world data sets demonstrate the effectiveness for better cluster

ensemble of the proposed method.

1. Introduction

Clustering is an important unsupervised classiﬁcation technique,

which has been extensively researched in different ﬁelds such as

statistics, pattern recognition, machine learning, and data mining

[1–3]. Following several clustering criteria and different methods of

similarity measurement, the underlying structure of a data set can be

revealed, e.g., the similar objects can be grouped into the same clus-

ter, while dissimilar objects can be assigned to different clusters [4].

Actually, each clustering algorithm has its own strengths and

weaknesses, and there is no single clustering algorithm capable of

delivering sound solutions for all data sets. With the objective of

improving the robustness, consistency, novelty and stability of sin-

gle clustering algorithm’s results, cluster ensemble (cluster fusion, or

consensus clustering) has emerged as a tool for leveraging the con-

sensus across multiple clustering results and combining them into an

optimal solution. It has gained increasing attention of researchers in

recent years [4–8].

Generally, cluster ensemble method involves two major steps:

Generation and Consensus Function. In the ﬁrst step, a set of diverse

∗

Corresponding author. Tel.: +86 28 66367458.

E-mail addresses: jiehu@swjtu.edu.cn (J. Hu), trli@swjtu.edu.cn, trli30@gmail.com

(T. Li), wanghongjun@swjtu.edu.cn (H. Wang), HFujita-799@acm.org (H. Fujita).

partitions of objects will be produced using a generative mechanism,

such as by homogenous algorithm with different parameters (or ini-

tializations) [9,10] or heterogeneous algorithms [5], etc. The consen-

sus function is the main step in any cluster ensemble algorithm, by

which a new partition is acquired by integrating all partitions ob-

tained in the generation step. There are numerous consensus func-

tion approaches, which can be classiﬁed into two main types: meth-

ods based on objects co-occurrence and methods based on median

partition [4]. In the median partition based consensus function ap-

proach, the resulting partition is acquired by ﬁnding an optimiza-

tion partition which maximizes the similarity (or minimizes the dis-

similarity) with all partitions in the cluster ensemble. Although a

great number of cluster ensemble methods have been proposed over

the past years, there are relatively few techniques in handling un-

certain, vague and overlapping information in the cluster ensemble

process.

Granular computing (GrC) [11,12], emerged as one of the fastest

growing information-processing paradigms in the domain of compu-

tational intelligence and human-centric systems, has been success-

fully applied in many ﬁelds. As a core part of GrC, the rough set the-

ory (RST) [13] forms the granules through the equivalence relation

deﬁned on objects of the universe and approximately express infor-

mation granulation by using a pair of non-numerical operators, i.e.,

lower and upper approximation operators [14]. Nowadays, RST has

http://dx.doi.org/10.1016/j.knosys.2015.10.006

Please cite this article as: J. Hu et al., Hierarchical cluster ensemble model based on knowledge granulation, Knowledge-Based Systems (2015),

http://dx.doi.org/10.1016/j.knosys.2015.10.006

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38708361

粉丝: 2
资源: 918

知识粒化层次聚类集成模型在无监督学习中的应用

计算机研究 -基于Hadoop的聚类集成方法研究.pdf

知识粒化层次聚类集成模型在无监督分类中的应用

基于谱分解的文本聚类集成方法研究

基于选择性聚类集成的客户细分

利用人群的智慧：聚类集成的多粒度方法

基于蚁群SVDD和聚类方法的旋转机械智能诊断

CHAMELEON算法：动态层次聚类在数据挖掘中的应用

创新层次聚类与新相似度度量：提升大数据挖掘效率

快速平面提取技术：聚集层次聚类在点云数据中的应用

LSI与CVX技术整合：基于Spark的稀疏凸聚类与ADMM应用

最新资源