微搏结构与互动融合：链接预测的新方法

需积分: 5 184 浏览量更新于2024-08-12 收藏 389KB PDF 举报

本文档主要探讨了一种新颖的链接预测方法，即"Learning to Predict Links by Integrating Structure and Interaction Information in Microblogs"，由Jia Yan-Tao、Wang Yuan-Zhuo和Cheng Xue-Qi三位学者合作完成，发表在《计算机科学技术学报》(Journal of Computer Science and Technology)上，2015年7月第30卷第4期，DOI:10.1007/s11390-015-1563-9。近年来，随着社交媒体的普及，特别是微博的广泛应用，无监督链接预测的研究逐渐成为热门课题。其目标是找出网络中用户之间的合适相似度度量，以便于预测潜在的社交连接。然而，当前许多方法在处理网络结构和用户间互动时缺乏一个直观且全面的整合机制，这导致了预测结果与实际链接的真实价值之间存在差距。传统的方法往往依赖单一的用户属性或关系来衡量相似性，忽视了网络的复杂性和动态性。论文作者提出了一种新的策略，旨在弥补这一不足。他们将微博的结构信息（如用户之间的直接关系，社区结构等）和交互信息（如转发、评论、@提及等行为）结合起来，作为预测潜在链接的重要依据。这种方法能够更全面地反映用户间的关联性和影响力，从而提高预测的准确性和有效性。具体而言，他们可能采用了机器学习算法，如基于图的模型（如图神经网络或随机游走），或者深度学习技术，来融合结构和交互数据。论文可能包括实验设计，比如构建微博网络数据集，评估不同特征组合对预测性能的影响，以及对比他们的方法与其他现有方法在F1分数、精确率和召回率等指标上的表现。此外，文章可能还讨论了如何处理不平衡的网络数据，如何处理噪声和异常值，以及如何在大规模微博数据中有效地进行计算。对于未来的研究方向，可能会提到如何将这些方法扩展到实时或流式数据，以及如何结合其他社交媒体平台的数据来进一步提升链接预测的准确性。这篇研究论文为我们提供了一种创新的思路，即通过结合微博的结构和交互信息，改进链接预测模型，这对于理解在线社交网络动态，预测用户行为，以及社交推荐等领域具有重要意义。

Yan-Tao Jia et al.: Learning to Predict Links by Structure and Interaction Information 831

full advantage of the property of Twitter as a social

media (see [14]) or an information diﬀusion channel.

One Twitter user A can address tweets of user B, and

then mentions B obliquely in his or her tweets, which

is syntaxed as “RT @B”. Another common practice is

that A “retweets” or rebroa dc asts B’s message, which

is syntaxed by @B. For a tweet messag e, the behavior-

based methods extract the usernames after the symbol

@, and consider that A and B have an interaction rela-

tionship. Hopcroft et al.

[9]

considered these interaction

relationships and deﬁned four features to represent the

number of retweets or replies fr om user A to user B

and from user B to user A, respectively. By integrat-

ing other feature s, they propose d a supervised method,

i.e., the Triad Factor Graph model, to predict the reci-

procity link. Similar work can refer to that by Lou et

al.

[10]

Our model is also behavior-ba sed. The diﬀerence

is that we integrate the structure and the interaction

behavior into a simpler matrix factorization framework.

As for the matrix factorization method used in the

link prediction problem, it is motivated by the success-

ful application of matrix factorization used in recom-

mender systems, where the model a ims to ﬁnd latent

features for users and items by factoriz ing the observed

matrix, see [15-17]. Converting the user-item pair to

the use r-user pair leads to the link prediction problem

as a link recommendation problem. Related work can

be fo und in the work o f Menon and Elkan

[18]

and Yin

et al.

[11]

Their models learned the latent features just

from the topological structures of the network. For ex-

ample, Yin et al.

[11]

analyzed the role of the interme-

diate user between two users, and divided its contri-

bution into two parts: one is the recommendation of

the intermediate user, and the other is the accepta nc e

of the recommendation of the intermediate user. Very

recently, Zhang et al.

[19]

enhanced Yin et al.’s work to

ﬁnd the real intermediate users and studied how they

contribute to the link formation process. To better pre-

dict new links in time-evolving social networks, Gao et

al.

[20]

integrated three types of information: the global

network structure, the content of nodes in the network

and the local information of a given vertex to derive

a matrix factorization model. Similar work by using

the matrix factorization method or the tensor factori-

zation method can refer to [8, 21 -22], etc. However,

these methods lack the consideration of the impact of

interactions between users on the link prediction. Our

work mixes the interaction information betwe e n user s

with the structure of the network.

3 Test the Existence of the Gap by Experiment

In this section, we ﬁrst examine the diﬀerent per-

formances of the S-Model by Yin et al.

[11]

on datasets

with diﬀerent sparseness and get its best predictive per-

formance, i.e., the maximum F 1-measure obtained by

S-Model. E xp eriments show that this maximum F 1-

measure does not ta ke its theoretical maximum 1. This

leads to a hypothesis that ther e exists a gap between

the predictive performa nc e and the ground truth for

S-Model. To narrow this gap, we propose the idea to

use the interaction information between users in the

dataset.

Before reconstructing the experiment of Yin et

al.

[11]

, let us simply recall S-Model as follows. The idea

of S-Model is to pre dic t new follower v

(called the tar-

get user) of the source user v

via the contributions of

some intermediate user v

. The contributions of v

can

be divided into two parts: one is the recommendation

of v

to v

, and the other is v

’s acceptance of the reco-

mmendation of v

for v

. Then S-Model studies the in-

ﬂuence of the network structures on v

’s contributions

by introducing the structure similarity between users.

After using the matrix factorization technique, S-Model

can predict new link formation for one static network

as well as two snapshots of the network in Twitter. To

ﬁnd the best performance of S-Model, we conduct the

experiment on the static dataset with diﬀerent sparse-

ness. Here the sparseness, denoted by nf, means the

average number of non-followers for a number of users.

We tune the sparseness of the dataset recursively from

one original dataset by randomly converting some num-

ber of follower s to non-followers. In other words, if we

construct a rating-like matrix with the row correspond-

ing to the source users and the column corresponding

to the target users, denoted by R

n×m

= (r

), where n

is the number of source users and m is the number of

target users, r

= 1 if v

follows v

and r

= 0 other-

wise, the tuning process is to randomly replace some

number of 1’s for each row with 0 respectively. The ini-

tial da taset corresponds to the matrix with all elements

being 1 except the diagonal elements, with the sparse-

ness nf = 1. It is easy to see that 1 6 nf 6 m −1. The

experiment is carried out by ﬁxing both the smoothing

factor and the structural factor being 0.01 in S-Model

and setting m = 10 000 to ﬁnd the relation between the

F 1-measure and the value nf. We depict the relation

for nf = 1, . . . , 31 as follows, since for the rest part, the

tendency of the curves is similar.

From Fig.1, we can see that the maximum F 1-

measure obtained by S-Model is 0.007 when nf = 29.

剩余13页未读，继续阅读

weixin_38696339

粉丝: 4
资源: 908

微搏结构与互动融合：链接预测的新方法

php微博学习系统-微博学习

基于微博数据的用户特征分析及行为预测

适合初学者学习的微博demo

新浪微博互动预测-挑战baseline

实际应用中文交互设备整合应用的实例

html5仿微博代码

基于android web mysql设计和开发微博应用

要实现微博分享需要什么模块

java安卓mysql仿微博

jupyter—notebook爬虫微博

最新资源