978-1-4673-7682-2/15/$31.00 ©2015 IEEE 2093
2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)
1
Credit Distribution and Influence Maximization in
Online Social Networks Using Node Features
Xiaoheng Deng, Yan Pan, You Wu, Jingsong Gui
School of Computer Science and Engineering
Central South University
Hunan, Changsha 410083
Abstract—Influence maximization is a problem of identifying a
small set of highly influential individuals such that obtaining the
maximized influence spread after propagation in social networks.
How to evaluate the influence is essential to solve the influence
maximization problem. Meanwhile, finding out influence propa-
gation paths is one of key factors in the assessment of influence
spread. However, since most of existent models and algorithms
use degrees to simplify the activation probability on edges, node
features are always ignored in the evaluation of influence ability
for different users. In this paper, besides the node features,
the Credit Distribution (CD) model is extended to incorporate
the time-critical aspect of influence in social networks. After
assigning credit along with the action propagation paths, we
pick up the nodes which have maximal marginal gain in each
iteration to form the seed set. The experiments we performed
on real online social networks demonstrate that our approach
is efficiency and reasonability for identifying seed sets, and the
influence spread prediction by our approach is more accurately
than that of original algorithm which disregards the node features
in the influence evaluation and diffusion.
Index Terms—online social networks, influence evaluation,
influence maximization, credit distribution, greedy algorithm
I. INTRODUCTION
The developments of Internet not only bring us a unprece-
dented convenient life-style, but also change our communica-
tion method. With the increasing number of people making
friends and exchanging ideas in online social networks, the
structure of our society is more complex and diverse than ever
before. Different from most traditional media such as news-
papers or TV, information spread in online social networks
mainly bases on the fictitious relationship between individuals.
Without distance constraints, the information can be propagat-
ed extremely quickly. At the same time, diffusion of influence
in online social networks provides great challenges for viral
marketing, which aims at finding a set of individuals in the
network to maximize the word-of-mouth propagation.
In general, there are three challenges for successfully viral
marketing in social networks [15], [16]. First, what the model
is and how it reflects the influence diffusion process appro-
priately. Second, how to evaluate the influence ability of user
and what components of it. Third, how to design an efficient
algorithm to identify a set of seed nodes which have maximal
social influence in the target society.
Domingos and Richardson firstly modeled influence max-
imization as an optimization algorithmic problem [1], [2].
Kempe et al. formulated the problem of selecting the seed
set as a discrete optimization problem which has been proved
to be an NP-hard problem [3]. Meanwhile, they present-
ed two cascade models i.e. Independent Cascade (IC)
model and the Linear T hreshold (LT) model. Specially,
they demonstrated that influence maximization under both
IC and LT models maintain desired properties such as the
monotonicity and the submodularity, which allows a greedy
algorithm to approximate the optimal solution with a radio
of (1-1/e). However, one of the main constraints of existed
models considered in Kempe et al. and its follow-ups, is that
they usually require conducting significantly times of Monte-
Carlo (MC) simulations to obtain an accurate estimate of
the expected spread. This is a time-consuming process which
makes the deployment of these techniques on large-scale social
networks infeasible [4]. Therefore, it is desired to overcome
the inefficiency of the approximation algorithm and find both
efficient and scalable approaches for solving the influence
maximization problem [5]. Many effective heuristic algorithms
such as LDAG [6] and PMIA [4], [7] have been proposed to
address this issue. They improve the computational efficiency
significantly for influence maximization, but always sacrifice
accuracy to achieve high calculating speed. On the other
hand, based on users’ action records and credit assignment in
chronological order, a data-based approach has been proposed
to solve the problem of influence maximization. In particular,
it has been proved that the quality of seed nodes identified by
Credit Distribution (CD) model could be guaranteed [8].
However, all of these aforementioned methods do not fully
incorporate node features that have been well observed in the
influence evaluation and diffusion. Motivated by this idea, we
propose two concepts of influence, user static influence and
user dynamic influence. We also propose a quantifiable method
to assess the user static influence which incorporates with node
features. To be mentioned, one node’s user static influence is
made up of three components. The first one is user activity
which reflects vitality of the node in a period of time. The sec-
ond is user sensitivity which describes sensitivity between
the user and actions he performed. The last is user affinity
that reflects similarity among different users by counting their
common actions. In addition, we also leverage the temporal
nature of influence to define the user dynamic influence to
reflect the time-critical effect. In particular, we propose a new
model, the Credit Distribution with N ode F eatures (CD-
NF) model, and let the credit assigned between two adjacent
nodes by user dynamic influence. We also design a heuristic