PU学习视角下的异构信息网络关系预测

157 浏览量更新于2024-08-26 收藏 423KB PDF 举报

"这篇研究论文探讨了一种基于PU学习的关系预测方法，主要应用于异构信息网络。在PU学习的背景下，解决数据不平衡问题，即正例集（具有目标关系的节点对集合）与未标记集（没有目标关系的节点对集合）之间数据数量的不均衡。文中提出了一种结合K-means聚类和投票机制的技术——SemiPU聚类，用于从未标记集中提取可靠的负例集RN，并构建了一个新的关系预测框架PURP。实验结果表明，PURP在DBLP合作作者网络数据上比比较方法表现更优，关键词包括链接预测、关系预测。" 正文: 在信息技术领域，关系预测是数据挖掘和网络分析中的一个关键问题，特别是在异构信息网络中，如社会网络、知识图谱等。这些网络由多种类型的不同实体（如人、组织、事件等）和它们之间的复杂关系构成。关系预测的目标是预测两个给定实体之间是否存在某种特定关系。本研究聚焦于PU学习（Positive and Unlabeled Learning）在关系预测中的应用。PU学习是一种在仅有正例和未标记数据的情况下进行分类的方法，特别适用于数据标注成本高或难以获取的情况。在关系预测中，由于获取所有节点对是否具有特定关系的完整标签通常是困难且昂贵的，因此PU学习提供了一种有效的策略。针对PU学习环境下的数据不平衡问题，论文提出了SemiPU聚类算法。该算法结合了K-means聚类和投票机制，旨在从未标记集U中筛选出最可能的负例集RN。K-means聚类用于将未标记节点对分组，而投票机制则用来确定哪些节点对最有可能不包含目标关系。这样，算法可以更准确地识别那些无目标关系的节点对，从而提高预测的准确性。在提出的框架PURP（Positive-Unlabeled Relationship Prediction）中，SemiPU聚类提取的负例集RN与正例集P一起被用于训练模型。通过这种方式，模型能够在较少的监督信息下学习到更有效的关系表示，从而改善预测性能。实验部分，研究者使用DBLP合作作者网络作为数据集，这是一个典型的异构信息网络，其中节点代表作者，边代表他们之间的合作关系。实验结果显示，与传统的比较方法相比，PURP在预测作者之间的合作关系方面表现出更高的精度和召回率，证明了其在实际应用中的有效性。这篇论文为关系预测提供了一种新颖的PU学习解决方案，尤其适用于数据标注困难的异构信息网络。SemiPU聚类和PURP框架的提出，为解决数据不平衡问题和提高预测性能提供了有价值的理论和实践指导。未来的研究可能进一步探索如何优化这个框架，以适应更复杂的网络结构和多样化的关系类型。

2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)

A Relation Prediction Method Based on PU Learning

Gao-Jing Peng, Ke-Jia Chen, Shijun Xue, Bin Liu

Jiangsu Key Laboratory of Big Data Security

& Intelligent Processing

Nanjing University of Posts and Telecommunications

Nanjing, Jiangsu 210046, China

penggj_njupt@163.com, chenkj@njupt.edu.cn, xueshijun_2010@163.com, bins@ieee.org

Abstract—This paper studies

relation prediction in

heterogeneous information networks under PU learning

context. One of the challenges of this problem is the imbalance

of data number between the positive set P (the set of node pairs

with the target relation) and the unlabeled set U (the set of

node pairs without the target relation). We propose a K-means

and voting mechanism based technique SemiPUclus to extract

the reliable negati

ve set RN from U under a new relation

prediction framework PURP. The experimental results show

that PURP achieves better performance than comparative

methods in DBLP co-authorship network data.

Keywords-link prediction; relation prediction; heterogeneous

information networks; PU learning

I. INTRODUCTION

Link prediction aims to predict the formation possibility

of missing links

or future links

in a network based on the

network’s current or historical data. It has a wide range of

applications, such as citation prediction in a bibliographic

dataset, product recommendation in an e-commerce service,

online advertisement click prediction in an online network

and so on

[1]. Most of the existing link prediction methods

are proposed for homogeneous information networks where

there is only one single type of nodes and edges.

However, types of nodes and edges in real networks are

usually multiple. These networks are called heterogeneous

information networks

(HINs). In HIN, structural

dependencies of different relations also increase the

difficulty of link prediction [2] [3] [4]. Recently, Sun et al. [5]

used the concept of meta-path in HINs

and proposed relation

prediction problem, which can be seen as an extension of

link prediction problem. Here is an example of relation

prediction in a co-authorship network (Figure 1). The

network includes four types of nodes and ten types of links.

The

target

relation to predict is the co-authorship between

any author pair, which can be represented by the meta-path

-1

write write

Author Paper Authoroo

Relation prediction can be treated as a supervised

learning process. If the target relation exists between a

and

, the label of node pair <a

, a

> is set to “+1”, otherwise it

is set to “-1”. This process normally requires a lot of

positive examples and negative examples to train the model.

However, the number of negative examples is often limited

or not available in many real-world fields. The PU learning

Figure 1. The co-authorship

network

technique will enable the use of positive and unlabeled

examples to construct a classification model.

In the above co-authorship network, if the target relation

between a

and a

does not exist at the moment, it does not

mean that the target relation

will not form in the

future. So

the label of a node pair without the target relation is better

set to “0” instead of “-1”. With this assumption, all node

pairs are now divided into the positive example set P and

the unlabeled example set U.

PU learning has become a new research topic in the field

of classification. Though widely used in text mining, graph

mining and so on

[7] [8] [9], it was not used in link mining

until recent years. In 2014, Zhang et al. [6] used PU learning

for the first time to predict anchor links between multiple

networks. They used the Spy technique to extract reliable

negative examples. Different from their work, this paper

aims to

predict the target relation

and does not limit to links

in a single HIN.

The main challenges of relation prediction in HIN are:

y Extraction of reliable negative examples

The most important challenge of PU learning is to

extract

reliable negative example

s RN from U. But

for relation prediction in HIN, most of the existing

semi-supervised PU learning methods are not

suitable any more. It is necessary to design a new

efficient method.

Heterogeneity of network

The types of nodes or links are multiple in HIN, so

traditional link prediction methods of homogeneous

information networks are no longer applicable. Also,

dependencies of

relations between nodes and

heterogeneity of links bring great difficulties for the

prediction task.

y Link sparsity

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38545117

粉丝: 9
资源: 917

PU学习视角下的异构信息网络关系预测

pcamatlab代码调用-USMO:用于PU学习的高效算法

人工智能-机器学习-基于机器学习的PU绿色轮胎质量控制的数学建模研究.pdf

PU Probability Prediction based Bayesian CompressiveSpectrum Sensing

python基于卷积神经网络的高光谱图像分类

4-2+基于GNN的社交推荐算法设计和应用.pdf

粒子群算法优化下的3D增材印花产品质量神经网络预测模型.pdf

技术接受模型PPT学习教案.pptx

语义角色标注与事件抽取的关系深入分析

电机控制系统能源管理技术：原理、方法及案例分析

java项目，课程设计-ssm病人跟踪治疗信息管理系统

最新资源