自构造Universum的半监督支持向量分类方法

56 浏览量更新于2024-08-26 收藏 1.59MB PDF 举报

"具有自构造Universum的半监督支持向量分类" 这篇研究论文提出了一个针对半监督分类问题的策略，特别是在支持向量机（SVM）框架下利用自构造的Universum数据进行迭代求解。半监督学习是机器学习的一个分支，它在训练数据中仅有一小部分带有标签的情况下寻求模型的构建。支持向量机是一种广泛应用的分类和回归方法，尤其在处理高维数据时表现出强大的能力。 Universum数据是指那些不属于两类目标类别的数据点，它们在分类任务中提供了一种先验知识。这些数据可以表示有意义的概念，帮助模型理解数据分布的全局结构，从而提高分类的准确性和鲁棒性。在传统的支持向量分类中，只考虑两类样本，而引入Universum数据则扩展了模型的视野，使其能够捕捉到未被明确标记的数据之间的关系。本文提出的方法创新之处在于，它通过自构造Universum数据集，即在训练过程中动态生成或选择这些数据，而非依赖于额外的无标签数据。这种方法允许模型在每一轮迭代中不断调整和优化，更好地适应数据的复杂性。迭代过程可能包括对现有样本的重新评估，以确定哪些更适合作为Universum，以及如何调整分类边界以包容这些新识别的Universum样本。该策略的实施过程可能包括以下步骤：首先，初始化支持向量机模型并选择一部分初始的Universum数据；然后，在每一轮迭代中，根据模型的预测结果更新和修正Universum，同时优化分类超平面；最后，当模型性能不再显著提升或者达到预设的迭代次数时，停止迭代，得到最终的分类模型。关键词：半监督学习、分类、Universum、支持向量机文章历史：2015年3月12日提交初稿，同年10月16日收到修订版，11月15日接受，由Yongdong Zhang通信。2015年11月26日在线发表。这个研究对机器学习社区具有重要意义，因为它提供了一种新的半监督学习方法，尤其对于那些标签信息有限但无标签数据丰富的场景，这种方法能有效地利用未标记数据提升模型性能。同时，通过自构造Universum，该方法降低了对外部数据源的依赖，提高了算法的适用性和灵活性。

Semi-supervised support vector classiﬁcation

with self-constructed Universum

Yingjie Tian

a,b

, Ying Zhang

, Dalian Liu

Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100190, China

Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China

School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100190, China

Department of Basic Course Teaching, Beijing Union University, Beijing 100101, China

article info

Article history:

Received 12 March 2015

Received in revised form

16 October 2015

Accepted 15 November 2015

Communicated by Yongdong Zhang

Available online 26 November 2015

Keywords:

Semi-supervised

Classiﬁcation

Universum

Support vector machine

abstract

In this paper, we propose a strategy dealing with the semi-supervised classiﬁcation problem, in which

the support vector machine with self-constructed Universum is iteratively solved. Universum data, which

do not belong to either class of interest, have been illustrated to encode some prior knowledge by

representing meaningful concepts in the same domain as the problem at hand. Our new method is

applied to seek more reliable positive and negative examples from the unlabeled dataset step by step,

and the Universum support vector machine(U-SVM) is used iteratively. Different Universum data will

result in different performance, so several effective approaches are explored to construct Universum

datasets. Experimental results demonstrate that appropriately constructed Universum will improve the

accuracy and reduce the number of iterations.

& 2016 Published by Elsevier B.V.

1. Introduction

In many traditional supervised learning, we acquire the deci-

sion function only through learning labeled dataset, however, in

some applications of machine learning, such as image retrieval [1],

text classiﬁcation [2], natural language parsing [3], abundant

amounts of unlabeled data can be cheaply and automatically

acquired. Even if we can label samples manually, it will be labor-

intensive and very time consuming. In such situation, the tradi-

tional supervised learning usually goes worse with the lacking of

enough supervised information. Semi-supervised learning (SSL)

[4–9] has attracted an increasing amount of interests which

addresses this problem by using large amount of unlabeled data,

together with the labeled data, to build better classiﬁer.

Semi-supervised learning problem: Given a training set

T ¼fðx

; y

Þ; …; ðx

; y

Þg⋃fx

l þ 1

; …; x

l þ q

g; ð1Þ

where x

A R

; y

A f1; 1g; i ¼ 1; …; l, x

A R

; i ¼ lþ1; …; lþq, and

the set x

1 þ 1

; …; x

l þ q

is a collection of unlabeled inputs known to

belong to one of the classes, predict the outputs y

1 þ 1

; …; y

l þ q

for

l þ 1

; …; x

l þ q

g and ﬁnd a real function g(x)inR

such that the

output y for any input x can be predicted by

f ðxÞ¼sgnðgðxÞÞ: ð2Þ

The motivation of semi-supervised methods is to take advan-

tage of the unlabeled data to improve the performance and there

are roughly ﬁve kinds of methods for solving the semi-supervised

learning problem such as Generative methods [10–13], Graph-

based methods [14–16], Co-training methods [17,18], Low-density

separation methods [19,20], and Self-training methods [21–23].

Self-training is probably the earliest idea about using unlabeled

data classiﬁcation is a commonly used technique. Self-training is

also known as self-learning, self-labeling, or bootstrapping (not to

be confused with the statistical procedure with the same name).

This is a wrapper-algorithm that repeatedly uses a supervised

method. First, only a small labeled examples are trained in a

classiﬁer to classify unlabeled data and select most conﬁdent

unlabeled points which will be added into the training set. The

classiﬁer is re-trained with the new data and the process is

repeated. The idea has been used in many applications [24–26].

Our method belongs to this ideology.

Universum, which is deﬁned as a collection of unlabeled points

known not belong to any class, was ﬁrstly proposed in [27]. It has

captured a general backdrop against the problem of interest and is

looked forward to represent meaningful information connected

with the classiﬁcation task at hand. Universum dataset is easy to

acquire, since there is so few requirement for it. Additionally, it can

catch some prior information of the ground-truth decision

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2015.11.041

0925-2312/& 2016 Published by Elsevier B.V.

Corresponding author.

E-mail addresses: tyj@ucas.ac.cn (Y. Tian),

zhangying112@mails.ucas.ac.cn (Y. Zhang), ldlluck@sina.com (D. Liu).

Neurocomputing 189 (2016) 33–42

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38559569

粉丝: 3
资源: 948

自构造Universum的半监督支持向量分类方法

基于粒子群算法的Universum SVM参数选择.pdf

归纳半监督大学分类

universum-api

universum-master.rar

Universum的多视图学习

Sanguis-Universum:Sanguis Universum这是一款为学习项目制作的游戏

Universum 3.0 - MetaTrader 4EA.zip

Universum 3.0 - MetaTrader 5EA.zip

universum：用@Serokell编写的前奏

Universum：Universum项目是一个Python解决方案，通过集成现有CI系统简化了SW项目验证，并为CI提供了其他功能

最新资源