This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WANG et al.: SELECTING INITIAL USERS FOR INFLUENCE MAXIMIZATION IN SOCIAL NETWORKS 3
the strength with which node i governs node j (i.e., node j is
influenced by a neighbor node i according to a weighted value
w
i,j
); w
i,j
and w
j,i
are generally different from each other.
Basically, there are two popular operational diffusion models
in the literature that capture underlying dynamics of diffusion
process: general threshold model and cascade model.
In the general threshold model, consider a potential node
i, represent its incoming neighbors by set N
i
, and assume
j∈N
i
w
j,i
≤ 1; then, the decision of node i to become active
depends on a threshold function (f
i
) of the set of active neigh-
bors of i and a threshold (called θ
i
) chosen uniformly at random
by node i from the interval [0,1]. The threshold θ
i
represents the
weighted fraction of node i’s neighbors that must become active
in order for the node i to become active. The process runs until
no more active users occur.
The cascade model is a type of probabilistic models in which
a user “catches” a specific behavior from her friends. It starts
with an initial set of active nodes A
0
, and the process unfolds
in discrete steps according to the following randomized rule.
At time step t, when there exists a directed (or undirected)
edge (i, j), such that i is active and j is not, node i is given
a single chance to activate node j (this activation succeeds with
some probability that might depend on properties of nodes i
and j and/or on the set of nodes that have already tried and
failed to activate j.); If i succeeds, then j will become active in
step (t +1), but whether i succeeds, it cannot make any further
attempts to activate j in subsequent rounds. Again, the process
runs until no more activations are possible. It should be noted
that, while the cascade model seems syntactically different from
the general threshold model, these two models are, in fact,
semantically equivalent [14].
In our paper, we utilize WC diffusion model for the problem
of influence maximization. Specifically, it can be described as
follows: assuming d
j
to be the indegree of node j in social
graph G and (i, j) an edge in G,ifi is activated in round t,
then with probability 1/d
j
, j is activated by i in round (t +1).
As described previously, due to the huge computational
overhead in greedy-based schemes, we seek to propose a new
heuristic method that could be cost-effective and achieve in-
fluential range as large as possible. There exist many heuristic
schemes to solve the problem of seed selection. A degree dis-
count heuristic algorithm called DegreeDiscount was proposed
in [10], in which the IC diffusion model is used. The basic
idea in DegreeDiscount is the following. Assuming node j be
a neighbor of vertex i,ifi has been selected as a seed, then
when considering whether to select j as a new seed based
on its degree, the edge (i, j) toward node j’s degree is not
counted, i.e., discounting j’s degree by one due to the presence
of i in the seed set, and the same discount will be done on
j’s degree, if every other neighbor of j is already in the seed
set. Simulations show that the performance of this heuristic
algorithm is comparable to that of the greedy algorithm for the
independent cascade model, while its running time is much less
than that of the greedy algorithm. Furthermore, [11] extended
the DegreeDiscount scheme with WC diffusion model.
Unlike the aforementioned works, our paper investigates
how to select the initial seeds from cost-effective viewpoint
and designs a new heuristic scheme, PPRank, that considers
various factors: persuasion cost, user’s influence power, and
overlapping effect.
Information diffusion models and the top-k node problem
are also appropriately considered from the view of blogspace,
where a blogger may have a certain level of interest in a topic
and is thus susceptible to talking about it. By discussing the
topic, the blogger may influence other bloggers [15]. A mech-
anism for detecting contagious outbreaks in social networks
was proposed in [16], which demonstrated that, by monitoring
only the friends of these randomly selected students, an early
detection of flu by up to 13.9 days at Harvard College can
be obtained. Based on the observation that people with larger
numbers of friends may have a high probability of being ob-
served among one’s friend circle (i.e., the friends of randomly
selected individuals may have higher centrality in friendship
graphs than average), a lightweighted, distributed, and random
walk-based protocol, iWander, was proposed for identifying
influential users in mobile social networks [9]. Reference [17]
investigated the connection between PageRank algorithm (orig-
inally designed for web graphs) and the problem of influence
maximization, by reversing all of the links of the original
networks (so-called reverse PageRank), because, in web graph,
receiving links increases page’s ranking, which is opposite to
the content of the influence. Furthermore, PRDiscount [26]
was proposed to alleviate the “overlapping effect” existing
in reverse PageRank-like schemes. Interestingly, greedy-based
algorithm and PageRank-inspired heuristic are integrated by
[18], which conducted the greedy algorithm on a small set
of nodes, consisting of the top nodes ranked by PageRank
algorithm on social networks.
As emphasized previously, all aforementioned existing
works ignore one key aspect of influence propagation that
we usually experience in the real social life: The cost used
to persuade individuals might vary highly (due to their dif-
ferent susceptibility of being influenced). Note that, given a
fixed budget and an arbitrary cost for selecting each node, the
problem of budgeted influence maximization (BIM) has been
investigated recently in [13]. Our paper is significantly different
from BIM in following aspects. First, BIM belongs to the cate-
gory of greedy-based schemes, while instead of focusing effort
on further improving the running time of greedy algorithms,
we argue that fine-tuned heuristics may provide truly scalable
solutions to the influence maximization problem with satisfying
influence spread and blazingly fast running time. Therefore,
our paper aims to design a budget-bounded heuristic scheme;
second, complying with empirical experience, our scheme,
PPRank, explicitly distinguishes and formally characterizes
each individual’s with two factors: IP and SI, and incorporates
two metrics as a selection criterion: Price-Performance-Ratio
and IP, which could maximize influence diffusion under the
constraint of a given marketing budget.
III. E
CONOMICAL SELECTION OF INITIAL SEEDS BASED
ON
PRICE-PERFORMANCE RATIO
A. Problem Statement
The main motivation of our paper stems from the following
consideration: Individuals may be heterogeneous in terms of