歧视感知流形正则化提升半监督分类

160 浏览量更新于2024-08-28 收藏 518KB PDF 举报

本文主要探讨了半监督分类学习（Semi-supervised Classification Learning）的一种新颖方法，即通过歧视感知的曼哈顿正则化（Discrimination-aware Manifold Regularization）。该研究由南京邮电大学计算机科学与工程学院的 Yunyun Wang、Songcan Chen 等作者合作完成，同时得到了南京航空航天大学计算机科学与工程学院和东南大学计算机科学与工程学院的支持。曼哈顿正则化（Manifold Regularization, MR）是一种在半监督学习中广泛应用的技术，它利用既有标签数据和未标记数据来构建一个全面的图谱，以此捕捉数据集中的高维结构，即“曼哈顿”或“切线”结构。在这个框架下，通过拉普拉斯图（Laplacian Graph）来表示数据的局部相似性，从而在模型训练过程中引入了对数据分布的平滑性约束。这种平滑性原则有助于保持近邻样本之间的预测一致性，对于有限标签数据的有效利用至关重要。然而，传统的MR方法可能在处理复杂的数据分布时遇到问题，特别是当不同类别的数据在高维空间中呈现出明显的非线性分离时。为解决这一挑战，作者提出了歧视感知的策略。他们考虑如何在维持整体数据结构的同时，更加关注于区分不同类别之间的边界，使模型能更好地学习到潜在的决策边界。具体来说，他们的方法首先构建了一个拉普拉斯矩阵，但在此基础上增加了对类别差异的敏感度，通过调整权重来强化类别间的区分性。这种增强的正则化项旨在确保在保持数据内在结构的同时，也能够突出不同类别的区别，从而提高分类性能。在算法实施过程中，他们可能采用了迭代优化技术，如梯度下降法，来求解模型参数，确保模型在有限的标签数据上达到最佳性能。该研究的工作历史展示了从最初的接收日期到修订和接受的逐步进展，最后是在Feiping Nie教授的指导下完成，并于2014年6月30日在线发布。关键词包括半监督分类、曼哈顿正则化、歧视和无监督聚类，这表明作者希望将他们的研究成果定位在半监督学习领域内的前沿研究，并且关注于如何结合不同技术来提升分类任务的准确性和效率。这篇文章的核心贡献在于提出了一种改进的半监督分类方法，通过歧视感知的曼哈顿正则化策略，能够在处理复杂数据分布和有限标签情况下，有效地进行分类学习。这对于实际应用中的数据挖掘和机器学习任务具有重要的理论和实践价值。

Semi-supervised classiﬁcation learning by discrimination-aware

manifold regularization

Yunyun Wang

a,b

, Songcan Chen

, Hui Xue

, Zhenyong Fu

Department of Computer Science and Engineering, Nanjing University of Posts & Telecommunications, Nanjing 210046, PR China

Department of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics, Nanjing 210016, PR China

School of Computer Science and Engineering, Southeast University, Nanjing 210096, PR China

article info

Article history:

Received 13 December 2013

Received in revised form

20 April 2014

Accepted 23 June 2014

Communicated by Feiping Nie

Available online 30 June 2014

Keywords:

Semi-supervised classiﬁcation

Manifold regularization

Discrimination

Unsupervised clustering

abstract

Manifold regularization (MR) provides a powerful framework for semi-supervised classiﬁcation (SSC)

using both the labeled and unlabeled data. It ﬁrst constructs a single Laplacian graph over the whole

dataset for representing the manifold structure, and then enforces the smoothness constraint over such

graph by a Laplacian regularizer in learning. However, the smoothness over such a single Laplacian graph

may take the risk of ignoring the discrimination among boundary instances, which are very likely from

different classes though highly close to each other on the manifold. To compensate for such deﬁciency,

researches have already been devoted by taking into account the discrimination together with the

smoothness in learning. However, those works are only conﬁned to the discrimination of the labeled

instances, thus rather limited in boosting the semi-supervised learning. To mitigate such an unfavorable

situation, we attempt to discover the possible discrimination in the available instances ﬁrst by

performing some unsupervised clustering over the whole dataset, and then incorporate it into MR to

develop a novel discrimination-aware manifold regularization (DAMR) framework. In DAMR, instances

with high similarity on the manifold will be restricted to share the same class label if belonging to the

same cluster, or to have different class labels, otherwise. Our empirical results show the competitiveness

of DAMR compared to MR and its variants likewise incorporating the discrimination in learning.

1. Introduction

In many real applications, the unlabeled data can be easily and

cheaply collected, while the acquisition of labeled data is usually

quite expensive and time-consuming, especially involving manual

effort. For instance, in web page recommendation, huge amounts

of web pages are available, but few users are willing to spend time

marking which web pages they are interested in. In spam email

detection, a large number of emails can be automatically collected,

yet few of them have been labeled spam or not by users. Conse-

quently, semi-supervised learning, which exploits a large amount

of unlabeled data jointly with the limited labeled data for learning,

has attracted intensive attention during the past decades. In this

paper, we focus on semi-supervised classiﬁcation, and so far, lots

of semi-supervised classiﬁcation methods have been developed

[1–4].

Generally, semi-supervised classiﬁcation methods attempt

to exploit the intrinsic data distribution information disclosed

by the unlabeled data in learning, and the information is usually

considered to be helpful for learning. To exploit the unlabeled

data, some assumption should be adopted for learning. Two

common assumptions in semi-supervised classi ﬁcation are the

cluster assumption and the manifold assumption [3–5]. The

former assumes that similar instances are likely to share the same

class label, thus guides the classiﬁcation boundary passing through

the low density region between clusters. The latter assumes that

the data are resided on some low dimensional manifold repre-

sented by a Laplacian graph, and similar instances should share

similar classiﬁcation outputs according to the graph. Almost all

off-the-shelf semi-supervised classiﬁcation methods adopt one or

both of those assumptions explicitly or implicitly [1,4]. For instance,

the large margin semi-supervised classiﬁcation methods, such as

transductive Support Vector Machine (TSVM) [6],semi-supervised

SVM (S3VM) [7] and their variants [8,9], adopt the cluster assump-

tion. The graph-based semi-supervised classiﬁcation methods, such

as label propagation [10,11],graphcuts[12] and manifold regular-

ization (MR) [13], adopt the manifold assumption. Furthermore,

there are also methods combining both assumptions for better

performances, such as RegBoost [14] and SemiBoost [15],etc.

In this paper, we concentrate on the MR framework [13], which

provides an effective way for semi-supervised classiﬁcation [16],

and has been applied in diverse applications such as image

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2014.06.059

Corresponding author. Tel.: þ86 25 84892956; fax: þ86 25 84892811.

E-mail address: s.chen@nuaa.edu.cn (S. Chen).

Neurocomputing 147 (2015) 299–306

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38507121

粉丝: 10
资源: 928

歧视感知流形正则化提升半监督分类

[GCN] 代码解析 of GitHub：Semi-supervised classification with graph convolutional networks

Kernel-Induced Label Propagation by Mapping for Semi-Supervised Classification

Generalization performance of graph-based semi-supervised classification

Graph-based-semi-supervised-learning-CRF

Semi-Supervised-Learning-Conformer

Semi-supervised classification based on random subspace dimensionality reduction

Self-labeled-techniques-for-semi-supervised-learning

semi -supervised classification with graph convolutional networks学习必记

Semi-Supervised-Learning-Image-Classification:该库包含使用TensorFlow 2.x和Python 3.x实现的计算机视觉任务的半监督学习算法

Semi-Supervised-Learning-Using-Gaussian-Fields-and-Harmonic-Functions_notes

最新资源