Then we construct M
′
as follows:
D = p × d
T
E = p × [1]
1×n
(4)
=
1
n
n×n
if
p
is the uniform distribution
M
′
= (1 − α)(M + D) + αE
This modification improves the quality of PageRank by introducing a decay factor 1 − α
which limits the effect of rank sinks [26], in addition to guaranteeing conver gence to a unique
rank vector. Substituting M
′
for M in Equation 2, we can express PageRank as the solution
to:
3
Rank = M
′
× Rank (5)
= (1 − α)(M + D) × Rank + αp (6)
with p = [
1
n
]
n×1
. The key to creating topic-sensitive PageRank is tha t we c an bias the
computation to increase the e ffect of certain categories of pages by using a no nuniform n × 1
personalization vector for p.
4
To ensure that M
′
is irreducible when p contains any 0 entries,
nodes not reachable from nonzero nodes in p should be removed. This modification is no t
implementationally problematic. Note tha t the topic-based influencing involves introducing
additional rank to the appropriate nodes in each iteration of the computation – it is not
simply a postpr ocessing step performed on the standard PageRank vector.
In terms of the random-walk model, the personalization vector represents the addition of
a complete set of transition edg e s where the pr obability on an artificial edge (u , v) is given by
αp
v
. We will denote the solution Rank
∗
of Equation 6, with α = α
∗
and a particular p = p
∗
,
as P R(α
∗
, p
∗
). By appropriately selecting p, the rank vector can be made to prefer certain
categories of pages. The bias factor α spe c ifies the degree to which the computation is biased
towards p.
3 Topic-Sensitive PageRank
In our approach to topic-sensitive PageRank, we pr ecompute the importance scores offline,
as with ordinar y PageRank. However, we compute multiple importance scores for each page;
we compute a set of score s of the importance of a page with respect to various topics. At
query time, these importance scores are combined based on the topics of the query to form
a composite PageRank score for those pages matching the query. This score can be use d in
conjunction with other IR-ba sed scoring schemes to produce a final rank for the result pages
with respect to the query. As the scor ing functions of commercial search engines are not
known, in our work we do not consider the effect of these IR scor e s (other than requiring
that the query terms appear in the page).
5
We believe that the improvements to PageRank’s
precision will transla te into improvements in overall search ra nk ings, even after other IR-based
scores are factored in. Note that the topic-sensitive PageRank score itself implicitly makes
use of IR in determining the topic of the query. However this use of IR is not vulnerable to
manipulation of pages by adversarial webmaster s seeking to raise the sc ore of their sites.
3
Equation 6 makes use of the fact that k
Rank
k
1
= 1.
4
Page et al.[26] originally suggest setting
p
directly u sing the bookmarks of the user, although that
approach is not practical for large numbers of users.
5
For instance, most search engines use term weighting schemes which make special use of HTML
tags.
5