社交网络好友推荐：聚类算法与因子分解机的结合应用

68 浏览量更新于2024-08-29 收藏 284KB PDF 举报

"这篇研究论文探讨了如何将聚类算法与因子分解机（Factorization Machine, FM）结合，用于社交网络中的好友推荐。作者来自中国科学技术大学北京分校和湖北工程大学的计算机与通信工程学院以及计算机与信息科学学院。他们提出了一种新的模型，旨在从社交网络的大数据中挖掘有用信息，解决用户关系和行为分析，以及数据稀疏性问题。" 在当前爆炸式增长的社交网络服务（Social Network Service, SNS）中，每天都会产生大量的数据。因此，从这些数据中提取有价值的信息是一项重要的任务。该论文聚焦于分析社交网络用户的关联和行为模式，并提出了一种创新的方法来推荐好友。该方法的核心是结合聚类算法和因子分解机。首先，聚类算法被用来对用户进行分类，这有助于识别和定位用户的特征和兴趣。聚类可以将具有相似属性和兴趣的用户归入同一类别，从而提供更精确的个性化推荐基础。接着，因子分解机被引入以解决数据稀疏性问题。在社交网络中，由于用户之间的互动可能非常有限，导致数据矩阵稀疏，这会影响推荐系统的性能。因子分解机能够捕捉非显式反馈中的潜在关系，即使在数据稀疏的情况下也能有效地预测用户行为。论文中，研究人员利用马尔科夫链蒙特卡洛（Markov Chain Monte Carlo, MCMC）算法训练提出的模型，并通过实验验证了其效果。MCMC是一种统计模拟方法，常用于处理复杂的概率模型，它能帮助优化模型参数，提高推荐的准确性。通过这种结合聚类和因子分解机的方法，该模型有望提高社交网络好友推荐的准确性和效率，从而提升用户体验，促进用户间的连接。这项工作为社交网络数据分析和推荐系统设计提供了新的思路，对于理解和改善社交网络中的信息传播、用户互动具有重要意义。

Combining Clustering Algorithm with Factorization Machine for Friend

Recommendation in Social Network

Yang Zhao, Yang Yang, Zhenqiang Mi

School of Computer and Communication Engineering

University of Science and Technology Beijing

Beijing, China

zhaoyangkylin@qq.com, {yyang, mizq}@ustb.edu.cn

Zenggang Xiong

School of Computer and Information Science

Hubei Engineering University

Xiaogan, Hubei, China

jkxxzg2003@163.com

Abstract—Social Network Service (SNS) has been explosively

growing and generating huge amounts of data every day, it is a

meaningful job to mine useful information from the big data

which generated from the social networks. In this paper, we

study the relationship and behavior of social network users,

and then put forward a model which combines Clustering

Algorithm with Factorization Machine (FM) for SNS Friend

Recommendation. With the help of Clustering Algorithm, we

classified the users and make it easy to locate users’

characteristics and interests, and by using FM we can solve the

Data Sparseness problem effectively. We trained this model by

Markov Chain Monte Carlo (MCMC) algorithm and verified

our model using Tencent Webo’s real dataset and proved it has

a better computational efficiency and better accuracy in

recommending friends.

Keywords-Factorization Machine; Clustering Algorithm;

SNS; recommendation

I. INTRODUCTION

With the rapid development of Internet, Social Network

Services such as Facebook, Twitter and Weibo have called

almost every user’s attention. Nowadays, in the sparse time

people can use their smart terminal chatting and sharing

some interesting blogs on Wechat or Weibo. The information

produced by SNS is huge and it spread widely, so that it’s

difficult to find the latent relation of users and to recommend

new friends who have the same interests in SNS.

Social network has its own characteristics: (1) There are

both real and fake information in social networks; (2) The

relationship in social networks is complex; (3) Every person

has his own character and unique habit. (4) Some people

have different behaviors between real life and network life.

All characteristics of Social network make the data more

complex and changeable. These massive data have a very

low value density, but also have some regulation.

Researchers can discover some valuable information from

the users’ potential behavior and relationship.

Clustering algorithm is one of the most important

technologies for data mining, which is used to discover

unknown classification in data set. In social networks, we

can get the users’ classifications based on their behavior and

characteristics. Those classifications are meaningful, which

can get some valuable information and reduce the data

computation. Kohrs and Mercado improved the time

performance by clustering, so that the scope of searching can

be minimized, search efficiency can be enhanced [1]. This

paper use K-means in social networks, which help reduce the

calculation. The data generated by SNS are extremely sparse,

faced with those data, Matrix Factorization used widely,

because it can get the accurate results, and the influence of

missing data will be reduced.

For a recommendation system, many people adopt the

idea of collaborative filtering algorithm and improve it better

to use. Goldberg proposed the recommendation algorithm

based collaborative filtering [2], which almost became the

mainstream ideology of the collaborative recommendation.

Collaborative filtering algorithm is applied to a variety of

recommendation systems, such as the movie

recommendation system Video Recommender [3] and the

MovieLens [4]. Choi got a better recommendation result than

the traditional collaborative filtering algorithm by calculating

user neighbors and project neighbors respectively [5]. But

when facing the situation where the amount of data becomes

larger and larger and the data itself become more and more

sparse, the Matrix Factorization algorithm (MF) has been

increasingly applied in recommender systems. Traditional

matrix factorization algorithm includes: SVD, NMF and

PMF. All of these algorithms decompose a high-dimensional

matrix into two or more low latitudes matrixes. The matrix’s

dimension reduced, which means that the complexity of the

computational also reduced. The traditional matrix

factorization model can effectively handle the sparse

problem of the big data, but it is usually targeted only for a

certain circumstance. Steffen Rendle proposed the

Factorization Machine (FM) Model in 2010 [6]. It references

the expression of SVM’s (Support Vector Machine) feature

factors interaction on the basis of the matrix decomposition,

thus it concentrates the advantages of the both. FM can

effectively solve the problem of cold start when facing such

extremely sparse data in SNS.

Based on all these cases, we establish an optimized friend

recommendation model using K-means algorithm combines

with FM to cluster users and recommend friends. In this

article we use Markov Chain Monte Carlo (MCMC) method

to train the training Dataset. In the experimental part, we

extract some Weibo data which gave out by Tencent at 2012;

finally we verified the accuracy of our friend