知识粒化层次聚类集成模型在无监督分类中的应用

PDF格式 | 501KB | 更新于2024-08-26 | 126 浏览量 | 举报

"基于知识粒化的层次聚类集成模型" 这篇研究论文探讨了“基于知识粒化的层次聚类集成模型”，该模型旨在解决无监督分类学习中的复杂性问题，特别是面对不确定性、模糊性和重叠等挑战时。聚类集成是通过生成多种不同的聚类解决方案，并将其融合成一个最终决策，从而在无监督学习中表现出高效性能的方法。然而，由于基聚类结果的内在复杂性，这一任务变得相当困难。论文中提到的知识粒化（Granular Computing）是一种迅速发展的信息处理范式，它在处理不确定性和模糊性数据时具有优势。知识粒化的基本思想是将原始数据转化为更抽象和可操作的粒度，这有助于简化复杂系统并提高决策质量。粒度可以看作是数据的组织单元，粒度越粗，表示的信息越概括；粒度越细，信息越具体。论文进一步提出了一种层次聚类集成模型，该模型利用知识粒化来处理数据的不确定性和模糊性。层次结构允许从不同粒度层次对数据进行分析，使得在不同抽象级别上都能获得有价值的洞察。通过在各个层次上生成聚类，模型能够捕获数据的多尺度结构，这有助于减少由于单一聚类解决方案可能导致的错误或遗漏。同时，论文还引入了粗糙集理论（Rough Sets），这是一种处理不完整或不精确信息的数学工具。粗糙集理论可以帮助识别和量化数据中的不确定性和模糊性，这对于构建聚类集成模型至关重要，因为它能帮助确定不同聚类之间的边界和相似性。在模型构建过程中，论文可能涵盖了以下步骤： 1. 数据预处理：包括数据清洗、标准化和粒化，以便适应聚类分析。 2. 层次聚类：在不同粒度水平上执行聚类算法，如K-means、DBSCAN或谱聚类等。 3. 知识粒化集成：利用粒化理论将不同层次的聚类结果整合起来，形成一个统一的决策。 4. 评估与优化：通过比较不同层次和粒度的聚类结果，选择最佳组合，可能采用一致性指数、V-measure等评价指标进行评估。这篇论文贡献了一种创新的聚类集成方法，通过知识粒化和粗糙集理论的结合，提高了处理复杂数据集的聚类效果，为无监督学习提供了更为稳健的解决方案。这种模型对于大数据分析、模式识别和信息挖掘等领域具有重要的应用价值。

Knowledge-Based Systems 91 (2016) 179–188

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

Hierarchical cluster ensemble model based on knowledge granulation

Jie Hu

, Tianrui Li

a,∗

, Hongjun Wang

, Hamido Fujita

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China

Faculty of Software and Information Science, Iwate Prefectural University, 020-0693, Iwate, Japan

article info

Article history:

Received 26 July 2015

Revised 14 September 2015

Accepted 3 October 2015

Availableonline16October2015

Keywords:

Cluster ensemble

Granular computing

Rough sets

abstract

Cluster ensemble has been shown to be very effective in unsupervised classiﬁcation learning by generating

a large pool of different clustering solutions and then combining them into a ﬁnal decision. However, the

task of it becomes more diﬃcult due to the inherent complexities among base cluster results, such as uncer-

tainty, vagueness and overlapping. Granular computing is one of the fastest growing information-processing

paradigms in the domain of computational intelligence and human-centric systems. As the core part of gran-

ular computing, the rough set theory dealing with inexact, uncertain, or vague information, has been widely

applied in machine learning and knowledge discovery related areas in recent years. From these perspectives,

in this paper, a hierarchical cluster ensemble model based on knowledge granulation is proposed with the

attempt to provide a new way to deal with the cluster ensemble problem together with ensemble learning

application of the knowledge granulation. A novel rough distance is introduced to measure the dissimilarity

between base partitions and the notion of knowledge granulation is improved to measure the agglomeration

degree of a given granule. Furthermore, a novel objective function for cluster ensembles is deﬁned and the

corresponding inferences are made. A hierarchical cluster ensemble algorithm based on knowledge granula-

tion is designed. Experimental results on real-world data sets demonstrate the effectiveness for better cluster

ensemble of the proposed method.

1. Introduction

Clustering is an important unsupervised classiﬁcation technique,

which has been extensively researched in different ﬁelds such as

statistics, pattern recognition, machine learning, and data mining

[1–3]. Following several clustering criteria and different methods of

similarity measurement, the underlying structure of a data set can be

revealed, e.g., the similar objects can be grouped into the same clus-

ter, while dissimilar objects can be assigned to different clusters [4].

Actually, each clustering algorithm has its own strengths and

weaknesses, and there is no single clustering algorithm capable of

delivering sound solutions for all data sets. With the objective of

improving the robustness, consistency, novelty and stability of sin-

gle clustering algorithm’s results, cluster ensemble (cluster fusion, or

consensus clustering) has emerged as a tool for leveraging the con-

sensus across multiple clustering results and combining them into an

optimal solution. It has gained increasing attention of researchers in

recent years [4–8].

Generally, cluster ensemble method involves two major steps:

Generation and Consensus Function. In the ﬁrst step, a set of diverse

∗

Corresponding author. Tel.: +86 28 66367458.

E-mail addresses: jiehu@swjtu.edu.cn (J. Hu), trli@swjtu.edu.cn, trli30@gmail.com

(T. Li), wanghongjun@swjtu.edu.cn (H. Wang), HFujita-799@acm.org (H. Fujita).

partitions of objects will be produced using a generative mechanism,

such as by homogenous algorithm with different parameters (or ini-

tializations) [9,10] or heterogeneous algorithms [5], etc. The consen-

sus function is the main step in any cluster ensemble algorithm, by

which a new partition is acquired by integrating all partitions ob-

tained in the generation step. There are numerous consensus func-

tion approaches, which can be classiﬁed into two main types: meth-

ods based on objects co-occurrence and methods based on median

partition [4]. In the median partition based consensus function ap-

proach, the resulting partition is acquired by ﬁnding an optimiza-

tion partition which maximizes the similarity (or minimizes the dis-

similarity) with all partitions in the cluster ensemble. Although a

great number of cluster ensemble methods have been proposed over

the past years, there are relatively few techniques in handling un-

certain, vague and overlapping information in the cluster ensemble

process.

Granular computing (GrC) [11,12], emerged as one of the fastest

growing information-processing paradigms in the domain of compu-

tational intelligence and human-centric systems, has been success-

fully applied in many ﬁelds. As a core part of GrC, the rough set the-

ory (RST) [13] forms the granules through the equivalence relation

deﬁned on objects of the universe and approximately express infor-

mation granulation by using a pair of non-numerical operators, i.e.,

lower and upper approximation operators [14]. Nowadays, RST has

http://dx.doi.org/10.1016/j.knosys.2015.10.006

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38576229

粉丝: 5

知识粒化层次聚类集成模型在无监督分类中的应用

计算机研究 -基于Hadoop的聚类集成方法研究.pdf

知识粒化层次聚类集成模型在无监督学习中的应用

基于谱分解的文本聚类集成方法研究

基于选择性聚类集成的客户细分

利用人群的智慧：聚类集成的多粒度方法

基于蚁群SVDD和聚类方法的旋转机械智能诊断

CHAMELEON算法：动态层次聚类在数据挖掘中的应用

创新层次聚类与新相似度度量：提升大数据挖掘效率

快速平面提取技术：聚集层次聚类在点云数据中的应用

LSI与CVX技术整合：基于Spark的稀疏凸聚类与ADMM应用

最新资源