Combining Clustering Algorithm with Factorization Machine for Friend
Recommendation in Social Network
Yang Zhao, Yang Yang, Zhenqiang Mi
School of Computer and Communication Engineering
University of Science and Technology Beijing
Beijing, China
zhaoyangkylin@qq.com, {yyang, mizq}@ustb.edu.cn
Zenggang Xiong
School of Computer and Information Science
Hubei Engineering University
Xiaogan, Hubei, China
jkxxzg2003@163.com
Abstract—Social Network Service (SNS) has been explosively
growing and generating huge amounts of data every day, it is a
meaningful job to mine useful information from the big data
which generated from the social networks. In this paper, we
study the relationship and behavior of social network users,
and then put forward a model which combines Clustering
Algorithm with Factorization Machine (FM) for SNS Friend
Recommendation. With the help of Clustering Algorithm, we
classified the users and make it easy to locate users’
characteristics and interests, and by using FM we can solve the
Data Sparseness problem effectively. We trained this model by
Markov Chain Monte Carlo (MCMC) algorithm and verified
our model using Tencent Webo’s real dataset and proved it has
a better computational efficiency and better accuracy in
recommending friends.
Keywords-Factorization Machine; Clustering Algorithm;
SNS; recommendation
I. INTRODUCTION
With the rapid development of Internet, Social Network
Services such as Facebook, Twitter and Weibo have called
almost every user’s attention. Nowadays, in the sparse time
people can use their smart terminal chatting and sharing
some interesting blogs on Wechat or Weibo. The information
produced by SNS is huge and it spread widely, so that it’s
difficult to find the latent relation of users and to recommend
new friends who have the same interests in SNS.
Social network has its own characteristics: (1) There are
both real and fake information in social networks; (2) The
relationship in social networks is complex; (3) Every person
has his own character and unique habit. (4) Some people
have different behaviors between real life and network life.
All characteristics of Social network make the data more
complex and changeable. These massive data have a very
low value density, but also have some regulation.
Researchers can discover some valuable information from
the users’ potential behavior and relationship.
Clustering algorithm is one of the most important
technologies for data mining, which is used to discover
unknown classification in data set. In social networks, we
can get the users’ classifications based on their behavior and
characteristics. Those classifications are meaningful, which
can get some valuable information and reduce the data
computation. Kohrs and Mercado improved the time
performance by clustering, so that the scope of searching can
be minimized, search efficiency can be enhanced [1]. This
paper use K-means in social networks, which help reduce the
calculation. The data generated by SNS are extremely sparse,
faced with those data, Matrix Factorization used widely,
because it can get the accurate results, and the influence of
missing data will be reduced.
For a recommendation system, many people adopt the
idea of collaborative filtering algorithm and improve it better
to use. Goldberg proposed the recommendation algorithm
based collaborative filtering [2], which almost became the
mainstream ideology of the collaborative recommendation.
Collaborative filtering algorithm is applied to a variety of
recommendation systems, such as the movie
recommendation system Video Recommender [3] and the
MovieLens [4]. Choi got a better recommendation result than
the traditional collaborative filtering algorithm by calculating
user neighbors and project neighbors respectively [5]. But
when facing the situation where the amount of data becomes
larger and larger and the data itself become more and more
sparse, the Matrix Factorization algorithm (MF) has been
increasingly applied in recommender systems. Traditional
matrix factorization algorithm includes: SVD, NMF and
PMF. All of these algorithms decompose a high-dimensional
matrix into two or more low latitudes matrixes. The matrix’s
dimension reduced, which means that the complexity of the
computational also reduced. The traditional matrix
factorization model can effectively handle the sparse
problem of the big data, but it is usually targeted only for a
certain circumstance. Steffen Rendle proposed the
Factorization Machine (FM) Model in 2010 [6]. It references
the expression of SVM’s (Support Vector Machine) feature
factors interaction on the basis of the matrix decomposition,
thus it concentrates the advantages of the both. FM can
effectively solve the problem of cold start when facing such
extremely sparse data in SNS.
Based on all these cases, we establish an optimized friend
recommendation model using K-means algorithm combines
with FM to cluster users and recommend friends. In this
article we use Markov Chain Monte Carlo (MCMC) method
to train the training Dataset. In the experimental part, we
extract some Weibo data which gave out by Tencent at 2012;
finally we verified the accuracy of our friend
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887
UIC-ATC-ScalCom-CBDCom-IoP 2015
978-1-4673-7211-4/15 $31.00 © 2015 IEEE
DOI 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.171
887