加速PageRank计算的二次外推方法

需积分: 9 200 浏览量更新于2024-10-04 收藏 161KB PDF 举报

"《利用二次外推法加速PageRank计算》本文探讨了在大规模网页链接矩阵中计算PageRank值的一种创新算法，PageRank是基于网页间链接关系的重要度量。原始的PageRank算法依赖于幂法，通过迭代逼近代表链接图的马尔可夫矩阵的主要特征向量，以此确定每个网页的重要性。然而，这种方法在处理大型网络时收敛速度较慢。新提出的算法名为"二次外推法"（Quadratic Extrapolation），它旨在加速幂法的收敛过程。其核心思想在于，通过对非主要特征向量的估计进行周期性地减去，使得当前迭代更快地接近目标。具体而言，该算法利用马尔可夫矩阵的第一个特征值为1这一特性，通过幂法的后续迭代来计算非主要特征向量。这种方法的优势在于能够更有效地控制迭代过程，减少不必要的计算步骤，从而显著提升计算效率。作者们通过实验证明，采用二次外推法能够在实践中显著提高PageRank值的计算速度，对于处理包含海量链接的大型网络尤其适用。这种方法不仅适用于学术研究，也对搜索引擎优化、推荐系统等领域有重要的实际应用价值，因为快速准确地计算PageRank有助于提高搜索结果的相关性和用户体验。本文提供了一种创新且高效的PageRank计算策略，对于提升大规模数据处理中的算法性能具有重要意义。"

Extrapolation Methods for Accelerating PageRank

Computations

Sepandar D. Kamvar

Stanford University

sdkamvar@stanford.edu

Taher H. Haveliwala

Stanford University

taherh@cs.stanford.edu

Christopher D. Manning

Stanford University

manning@cs.stanford.edu

Gene H. Golub

Stanford University

golub@stanford.edu

ABSTRACT

We present a novel algorithm for the fast computation of PageRank,

a hyperlink-based estimate of the “importance” of Web pages. The

original PageRank algorithm uses the Power Method to compute

successive iterates that converge to the principal eigenvector of the

Markov matrix representing the Web link graph. The algorithm

presented here, called Quadratic Extrapolation, accelerates the con-

vergence of the Power Method by periodically subtracting off es-

timates of the nonprincipal eigenvectors from the current iterate of

the Power Method. In Quadratic Extrapolation, we take advantage

of the fact that the ﬁrst eigenvalue of a Markov matrix is known

to be 1 to compute the nonprincipal eigenvectors using successive

iterates of the Power Method. Empirically, we show that using

Quadratic Extrapolation speeds up PageRank computation by 25–

300% on a Web graph of 80 million nodes, with minimal overhead.

Our contribution is useful to the PageRank community and the nu-

merical linear algebra community in general, as it is a fast method

for determining the dominant eigenvector of a matrix that is too

large for standard fast methods to be practical.

Keywords

PageRank, link analysis, eigenvector computation

1. INTRODUCTION

The PageRank algorithm for determining the “importance” of

Web pages has become a central technique in Web search [18]. The

core of the PageRank algorithm involves computing the principal

eigenvector of the Markov matrix representing the hyperlink struc-

ture of the Web. As the Web graph is very large, containing over a

billion nodes, the PageRank vector is generally computed ofﬂine,

during the preprocessing of the Web crawl, before any queries have

been issued.

The development of techniques for computing PageRank efﬁ-

ciently for Web-scale graphs is important for a number of reasons.

For Web graphs containing a billion nodes, computing a PageRank

vector can take several days. Computing PageRank quickly is nec-

essary to reduce the lag time from when a new crawl is completed

to when that crawl can be made available for searching. Further-

more, recent approaches to personalized and topic-sensitive Page-

Rank schemes [11, 20, 14] require computing many PageRank vec-

tors, each biased towards certain types of pages. These approaches

intensify the need for faster methods for computing PageRank.

Eigenvalue computation is a well-studied area of numerical lin-

ear algebra for which there exist many fast algorithms. However,

WWW2003, May 20–24, 2003, Budapest, Hungary.

ACM 1-58113-680-3/03/0005.

many of these algorithms are unsuitable for our problem as they re-

quire matrix inversion, a prohibitively costly operation for a Web-

scale matrix. Here, we present a series of novel algorithms devised

expressly for the purpose of accelerating the convergence of the

iterative PageRank computation. We show empirically on an 80

million page Web crawl that these algorithms speed up the compu-

tation of PageRank by 25–300%.

1.1 Preliminaries

In this section we summarize the deﬁnition of PageRank [18]

and review some of the mathematical tools we will use in analyz-

ing and improving the standard iterative algorithm for computing

PageRank.

Underlying the deﬁnition of PageRank is the following basic as-

sumption. A link from a page

    

to a page



   

can

be viewed as evidence that



is an “important” page. In particu-

lar, the amount of importance conferred on





is proportional

to the importance of



and inversely proportional to the number of

pages



points to. Since the importance of



is itself not known,

determining the importance for every page



   

requires an

iterative ﬁxed-point computation.

To allow for a more rigorous analysis of the necessary compu-

tation, we next describe an equivalent formulation in terms of a

random walk on the directed Web graph



. Let

 



denote the

existence of an edge from







. Let

   

 

be the outdegree

of page





. Consider a random surfer visiting page



at time



In the next time step, the surfer chooses a node

 

from among



’s

out-neighbors

  

 

 

uniformly at random. In other words, at

time

  

, the surfer lands at node

 



  

 

 

with proba-

bility

     

 

The PageRank of a page



is deﬁned as the probability that at

some particular time step

  

, the surfer is at page



. For

sufﬁciently large



, and with minor modiﬁcations to the random

walk, this probability is unique, illustrated as follows. Consider

the Markov chain induced by the random walk on



, where the

states are given by the nodes in



, and the stochastic transition

matrix describing the transition from





is given by



with

 

 

     



For



to be a valid transition probability matrix, every node must

have at least 1 outgoing transition; i.e.,



should have no rows con-

sisting of all zeros. This holds if



does not have any pages with

outdegree



, which does not hold for the Web graph.



can be

converted into a valid transition matrix by adding a complete set

of outgoing transitions to pages with outdegree



. In other words,

we can deﬁne the new matrix

 

where all states have at least one

outgoing transition in the following way. Let



be the number of

nodes (pages) in the Web graph. Let



be the



-dimensional col-

umn vector representing a uniform probability distribution over all

下载后可阅读完整内容，剩余9页未读，立即下载

Mental___

粉丝: 0
资源: 1

加速PageRank计算的二次外推方法

One-step extrapolation method for reverse time migration

Richardson Extrapolation of Iterated Discrete Projection Methods for Eigenvalue Approximation

Vector Extrapolation Based Landweber Method for Discrete Ill-Posed Problems

gradual-extrapolation

richardson-extrapolation:使用Richardson Extrapolation序列加速来计算收敛阶数和序列的精确值

Extrapolation of discrete bandlimited signals in linear canonical transform domain

2D Extrapolation：执行2D外推的功能-matlab开发

imaging of acoustic energy by wave field extrapolation:A theoretical aspects.pdf

PageRank算法的优化和改进.pdf

matlab图像膨胀代码-SieNet-Image-extrapolation:SiENet：用于图像外推的连体扩展网络

最新资源