(k jLj) places of L, the precision at the ranking position k
is defined as following (P@k is short for Precision@k),
Precision@k ¼ N
k
=k ð1Þ
2.2 Diversity and novelty of recommendations
As mentioned above, the diversity of recommendations
can be measured in two ways: individual and aggregate.
The metrics of individual diversity usually characterize an
accumulated dissimilarity between all pairs of items
within a recommendation list, e.g., intra-list similarity [4,
34] and item novelty [15]. In contrast to individual
diversity, there have been few studies that explore
aggregate diversity in recommender systems. Herlocker
et al. [13] experimented with the coverage defined as the
percentage of items for which the recommender system is
able to make recommendations. Adomavicius and Kwon
[1] defined diversity-in-top-N as an aggregate diversity
measure, to count the total number of distinct items rec-
ommended to all users. Since we intend to evaluate rec-
ommender systems based on the top-k recommended lists
of items that the system provides to its users, in this
paper, we take hamming distance as our aggregate
diversity measure. Given two candidate lists L
u
and L
v
for
user u and v, the difference in the top-k
(k m axðjL
u
j; jL
v
jÞ) places can be measured using
hamming distance as following [33],
HD@k ¼ 1 overlap@k=k ð2Þ
where overlap@k is the number of shared resources in the
top-k places of the two lists. Averaging over all pairs of
users, we can obtain the aggregate diversity of the system.
Clearly, HD@k can characterize the uniqueness of different
user’s recommendation lists, higher diversity means higher
personalization of users’ recommendation lists, HD@k ¼ 1
points to the fact that every user receives his/her own
unique top-k items [33].
Novelty and diversity are different though related
notions. The novelty of a piece of information generally
refers to how different it is with respect to ‘‘what has been
previously seen’’ by a specific user, or a community as a
whole [29]. To evaluate aggregate novelty of recommen-
dations, we use popularity-based item novelty proposed by
Vargas & Castells [29],
Nov@k ¼logpðkjuÞð3Þ
where k indicates the k-th resource in the recommendation
list L of u. Here, the posterior pðkjuÞ cannot be estimated
directly, since it assumes no observation of u accessing k
before recommendations are made. Instead, Vargas &
Castells suggested estimating pðkjuÞ based on other items
the user has accessed. Let us assume the observed data
consist of a set S of user-based assignment of tags to
resources, reflecting item access by users; then, pðkjuÞ can
be inferred as followings,
pðkj uÞ/
P
i2I
u
pðkjiÞpðijuÞ
pðkjiÞ/
jU
k
\ U
i
j
jU
i
j
pðijuÞ/
jðu; t; iÞ2Sj
jðu; t; i
0
Þ2Sj
ð4Þ
where I
u
denotes the set of items collected by u, and U
k
denotes the set of users who have accessed k. We average
Nov@k across all users, to measure the capability of a
recommender system provides novel items to its users.
Since the novelty measure is not normalized and universal
for different systems, it is considered as a complementary
measure to the diversity. Note that there still exists an
apparent dilemma between recommendation accuracy and
diversity (novelty). Thus, in this paper, we aim to find new
techniques to improve aggregate diversity and novelty
while maintaining accuracy of recommendations, espe-
cially for folksonomy-based systems.
3 Methodology
3.1 Folksonomy schema
A folksonomy describes the users, the resources, the tags
and the user-based assignment of tags to resources. It is a
tuple ðU; T; R; SÞ where U; T and R are finite sets whose
elements are called users, tags and resources, respectively,
and S is a ternary relation between them, i.e.,
S ¼ U T R, whose elements are called tag assignments
[17]. Users are typically described by their user ID, and
tags may be arbitrary strings. What is perceived as a
resource depends on the type of social application system.
For instance, in Delicious, the resources are URLs, and in
Lastfm, the resources are artists and tracks. There are also
other homogenous links among users or resources, such as
friendship among users, hyperlinks among Web pages and
citation links among papers. In addition, resources may
have various attributes information. A typical folksonomy
schema can be described as Fig. 1, which can be also
formulated as an association link network [21].
3.2 Personalized PageRank
PageRank [5] is an ordering nodes technique by a random
surfer model in a directed graph G ¼ðV; EÞ, where
VðjVj¼nÞ is the set of nodes and E is the set of edges. The
random surfer performs a Markovian walk on G. The surfer
jumps from node to node following a link with uniform
probability d (called as damping factor) or gets bored and
Pers Ubiquit Comput (2014) 18:1855–1869 1857
123