Extrapolation Methods for Accelerating PageRank
Computations
Sepandar D. Kamvar
Stanford University
sdkamvar@stanford.edu
Taher H. Haveliwala
Stanford University
taherh@cs.stanford.edu
Christopher D. Manning
Stanford University
manning@cs.stanford.edu
Gene H. Golub
Stanford University
golub@stanford.edu
ABSTRACT
We present a novel algorithm for the fast computation of PageRank,
a hyperlink-based estimate of the “importance” of Web pages. The
original PageRank algorithm uses the Power Method to compute
successive iterates that converge to the principal eigenvector of the
Markov matrix representing the Web link graph. The algorithm
presented here, called Quadratic Extrapolation, accelerates the con-
vergence of the Power Method by periodically subtracting off es-
timates of the nonprincipal eigenvectors from the current iterate of
the Power Method. In Quadratic Extrapolation, we take advantage
of the fact that the first eigenvalue of a Markov matrix is known
to be 1 to compute the nonprincipal eigenvectors using successive
iterates of the Power Method. Empirically, we show that using
Quadratic Extrapolation speeds up PageRank computation by 25–
300% on a Web graph of 80 million nodes, with minimal overhead.
Our contribution is useful to the PageRank community and the nu-
merical linear algebra community in general, as it is a fast method
for determining the dominant eigenvector of a matrix that is too
large for standard fast methods to be practical.
Keywords
PageRank, link analysis, eigenvector computation
1. INTRODUCTION
The PageRank algorithm for determining the “importance” of
Web pages has become a central technique in Web search [18]. The
core of the PageRank algorithm involves computing the principal
eigenvector of the Markov matrix representing the hyperlink struc-
ture of the Web. As the Web graph is very large, containing over a
billion nodes, the PageRank vector is generally computed offline,
during the preprocessing of the Web crawl, before any queries have
been issued.
The development of techniques for computing PageRank effi-
ciently for Web-scale graphs is important for a number of reasons.
For Web graphs containing a billion nodes, computing a PageRank
vector can take several days. Computing PageRank quickly is nec-
essary to reduce the lag time from when a new crawl is completed
to when that crawl can be made available for searching. Further-
more, recent approaches to personalized and topic-sensitive Page-
Rank schemes [11, 20, 14] require computing many PageRank vec-
tors, each biased towards certain types of pages. These approaches
intensify the need for faster methods for computing PageRank.
Eigenvalue computation is a well-studied area of numerical lin-
ear algebra for which there exist many fast algorithms. However,
Copyright is held by the author/owner(s).
WWW2003, May 20–24, 2003, Budapest, Hungary.
ACM 1-58113-680-3/03/0005.
many of these algorithms are unsuitable for our problem as they re-
quire matrix inversion, a prohibitively costly operation for a Web-
scale matrix. Here, we present a series of novel algorithms devised
expressly for the purpose of accelerating the convergence of the
iterative PageRank computation. We show empirically on an 80
million page Web crawl that these algorithms speed up the compu-
tation of PageRank by 25–300%.
1.1 Preliminaries
In this section we summarize the definition of PageRank [18]
and review some of the mathematical tools we will use in analyz-
ing and improving the standard iterative algorithm for computing
PageRank.
Underlying the definition of PageRank is the following basic as-
sumption. A link from a page
to a page
can
be viewed as evidence that
is an “important” page. In particu-
lar, the amount of importance conferred on
by
is proportional
to the importance of
and inversely proportional to the number of
pages
points to. Since the importance of
is itself not known,
determining the importance for every page
requires an
iterative fixed-point computation.
To allow for a more rigorous analysis of the necessary compu-
tation, we next describe an equivalent formulation in terms of a
random walk on the directed Web graph
. Let
denote the
existence of an edge from
to
in
. Let
be the outdegree
of page
in
. Consider a random surfer visiting page
at time
.
In the next time step, the surfer chooses a node
from among
’s
out-neighbors
uniformly at random. In other words, at
time
, the surfer lands at node
with proba-
bility
.
The PageRank of a page
is defined as the probability that at
some particular time step
, the surfer is at page
. For
sufficiently large
, and with minor modifications to the random
walk, this probability is unique, illustrated as follows. Consider
the Markov chain induced by the random walk on
, where the
states are given by the nodes in
, and the stochastic transition
matrix describing the transition from
to
is given by
with
.
For
to be a valid transition probability matrix, every node must
have at least 1 outgoing transition; i.e.,
should have no rows con-
sisting of all zeros. This holds if
does not have any pages with
outdegree
, which does not hold for the Web graph.
can be
converted into a valid transition matrix by adding a complete set
of outgoing transitions to pages with outdegree
. In other words,
we can define the new matrix
where all states have at least one
outgoing transition in the following way. Let
be the number of
nodes (pages) in the Web graph. Let
be the
-dimensional col-
umn vector representing a uniform probability distribution over all