StaticGreedy: Solving the Scalability-Accuracy Dilemma in
Influence Maximization
Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
{chengsuqi, shenhuaw ei, huangjunming, gqzhang, cxq}@ict.ac.cn
ABSTRACT
Influence maximization, defined as a problem of finding a
set of seed nodes to trigger a maximized spread of influ-
ence, is crucial to viral marketing on social networks. For
practical viral marketing on large scale social networks, it
is required that influence maximization algorithms should
have both guaranteed accuracy and high scalability. How-
ever, existing algorithms suffer a scalability-accuracy dilem-
ma: conventional greedy algorithms guarantee the accuracy
with expensive computation, while the scalable heuristic al-
gorithms suffer from unstable accuracy.
In this paper, we focus on solving this scalability-accuracy
dilemma. We point out that the essential reason of the
dilemma is the surprising fact that the submodularity, a
key requirement of the objective function for a greedy algo-
rithm to approximate the optimum, is not guaranteed in all
conventional greedy algorithms in the literature of influence
maximization. Therefore a greedy algorithm has to afford a
huge number of Monte Carlo simulations to reduce the pain
caused by unguaranteed submodularity. Motivated by this
critical finding, we propose a static greedy algorithm, named
StaticGreedy, to strictly guarantee the submodularity of
influence spread function during the seed selection process.
The proposed algorithm makes the computational expense
dramatically reduced by two orders of magnitude without
loss of accuracy. Moreover, we propose a dynamical update
strategy which can speed up the StaticGreedy algorithm by
2-7 times on large scale social networks.
Categories and Subject Descriptors
F.2.2 [Analysis of Algorithms and Problem Complex-
ity]: Non-numerical Algorithms and Problems; D.2.8 [Software
Engineering]: Metrics—complexity measures, performance
measures
General Terms
Algorithms, Experiments, Performance
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
CIKM’13, Oct. 27–Nov. 1, 2013, San Francisco, CA, USA.
Copyright 2013 ACM 978-1-4503-2263-8/13/10 ...$15.00.
http://dx.doi.org/10.1145/2505515.2505541.
Keywords
influence maximization; greedy algorithm; scalability; social
networks; viral marketing
1. INTRODUCTION
We are witnessing the increasing prosperity of online so-
cial network sites and social media sites, where people are
connected by heterogeneous social relationships. These on-
line social networks provide convenient platforms for in-
formation dissemination and marketing campaign, allowing
ideas and behaviors to flow along the social relationships in
the effective word-of-mouth manner. Many companies have
made efforts to popularize or promote their brands or prod-
ucts on online social networks by launching campaigns akin
to viral marketing. The success of viral marketing is rooted
in the interpersonal influence, which has been empirically
studied in various contexts [8, 24, 15, 11, 12, 18, 29, 1].
Influence maximization, formulated as a discrete optimiza-
tion problem by Kempe et al. [14], is a fundamental problem
for viral marketing. It aims to find a fixed-size set of seed
nodes, which can influence the maximum number of nodes,
generally referred to as influence spread. The solution of the
influence maximization problem is closely related to infor-
mation spread models, which are used to model the process
of influence spread. Two commonly-used models are the in-
dependent cascade model and the linear threshold model.
Kempe et al. [14] proved the influence maximization prob-
lem is NP-hard with either model, and proposed a greedy
algorithm to approximate the optimal solution within a fac-
tor of (1 − 1/e − ), where depends on the accuracy of
influence spread estimation. Since no algorithm can effi-
ciently estimate the exact influence spread of a given seed
set on typically sized networks [5, 7], Monte Carlo approach
is usually used to provide an approximation, resulting a s-
mall positive error .
Unfortunately, the greedy algorithm proposed by Kempe
et al. (referred to as GeneralGreedy in this paper) suffers
severe scalability problem, i.e., it relies on a huge number
of Monte Carlo simulations to achieve a fair solution, which
results in an unaffordable computation on large-scale social
networks. To overcome this problem, many efforts have been
made to explore a more scalable greedy algorithm along two
directions [17, 6, 16, 28, 22, 13, 9]. On one direction, re-
searchers insisted on Monte Carlo simulations and reduced
the number of trials that need Monte Carlo simulations to
estimate the influence spreads of node sets. For example,
a “lazy-forward” strategy was proposed to effectively reduce
the number of candidate nodes [17]. However, the reduction