3.3. Recommender components
In collaborative tagging systems, item recommendation is
formulated as: given a target user u 2 U, we first use various
components based on different principles to predict the likelihood
of an unseen item (or resource) i 2 I to be accessed by u, namely, to
estimate pðijuÞ, then these unseen items are ranked according to
pðijuÞ and suggested to u.
3.3.1. Collaborative filtering
User-based CF (UCF). User-based collaborative filtering [2,44] is
based on the assumption: a natural way to find the contents of
interest for a user u is to first find other like-mind users of u, and
then recommend u the contents these users are interested in, as
‘‘birds of a feather flock together’’. Following this, we give the UCF
model to estimate pðijuÞ, as expressed in Eq. (5),
pðijuÞ/
X
v
2NSðuÞ
pðij
v
Þ
simðu;
v
Þ
P
v
0
2NSðuÞ
simðu;
v
0
Þ
ð5Þ
where NSðuÞ is the neighbor set of u; pðij
v
Þ indicates the preference
of user
v
to item i. Here, we let pðij
v
Þ¼1, if
v
has tagged i; pðij
v
Þ¼0,
otherwise. simðu;
v
Þ represents the similarity between u and
v
, and
can be calculated using cosine-based or jaccard-based functions in
combination with different feature weighting schemes.
Item-based CF (ICF). Different from UCF, Item-based CF makes
suggestions considering those items are similar to the items
collected by the target user [15,2]. To estimate pðijuÞ based on
the ICF, we use Eq. (6),
pðijuÞ/
X
j2I
u
simði; jÞ
P
i
0
2NSðjÞ
simði
0
; jÞ
pðjjuÞð6Þ
where NSðjÞ is the neighbor set of item j; pðjjuÞ indicates the
preference of u to j. Here, we let pðjjuÞ¼1=jI
u
j, and I
u
is the item
set collected by u. simði; jÞ can be computed using different combi-
nations of similarity functions and profiling models.
Social-based CF (SCF). Generally, users in social systems are
strongly affected by their social friends, and more prefer to adopt
the recommendations came from their friends. Thus social
relations provide a trustable manner to make recommendation in
social tagging systems. If there is enough social information
available, more reliable and accurate recommendations should be
achieved. For this, we introduce another UCF-like recommender
proposed in Ben-Shimon et al. [6]. The recommender explicitly
introduces distances between users in the general UCF framework
against a social graph, as in Eq. (7),
pðijuÞ/
X
v
2FSðu;LÞ
pðij
v
Þb
lðu;
v
Þ
simðu;
v
Þ
P
v
0
2FSðu;LÞ
simðu;
v
0
Þ
ð7Þ
where b is an attenuation coefficient of the social network that
adjusts the effect of the distance, lðu;
v
Þ, between two users.
simðu;
v
Þ can be estimated in the same way as in the UCF. FSðu; LÞ
is the set of neighbors who have a social relation with the user u
by a maximum path length L. Here, we can use a Breadth-First Search
algorithm to find FSðu; LÞ for each user [35]. The search starts from
the node u, accesses u’s directly-connected nodes, marks these
nodes with distance l ¼ 1, and adds them to FS. Then, for every node
in FS, it mark its neighbors not included in FS with distance l ¼ l þ 1,
and adds such neighbors to FS. Finally, it repeats this process a few
more times to find the set FSðu; LÞ.
3.3.2. Random walks
Different from CF-based methods, network-based methods
consider relevance or similarities propagation on folksonomy
graph, and attempt to fully utilize information provided by folks-
onomy data, thus can alleviate the problem of data sparsity.
Random Walk with Restart(RWR). RWR-based similarity has been
proved a good measure to personalized recommendation [26,48].
Given a directed graph, it considers a random particle that starts
from node x. The particle iteratively transmits to its neighborhood
with the probability that is proportional to their edge weights. Also
at each step, it has some probability d (the classic setting is 0.85) to
return to x. The relevance score of node y w.r.t node x; pðyjxÞ,is
defined as the steady-state probability that the particle will finally
stay at node y. Such a steady probability can be gotten using power
method computing until convergence, shown as Eq. (8),
p
ðtÞ
¼ dAp
ðt1Þ
þð1 dÞe ð8Þ
where A is the column-normalized adjacent matrix of the graph.
A
ji
¼ 1=O
i
if node i links to node j, and O
i
is the outgoing degree of
the node i; otherwise, A
ji
¼ 0. The vector e is the preference vector
that is usually configured as e
x
¼ 1 and e
y
¼ 0 for any other node y.
To use RWR model in item recommendation, a social tagging
graph should adhere to the folksonomy schema. In this graph,
the relevance score of resource i w.r.t user u is correspondingly
defined as the steady-state probability that the particle will finally
stay at node i, namely, pðijuÞ. To estimate pðijuÞ, we suggest project-
ing the folksonomy schema into several sub-schemas and apply
RWR-based similarity on the user-resource (UR) and resource-tag
(RT) bipartite graphs, respectively. Correspondingly, for the UR
bigraph, we set e to prefer the node representing u, and for the
RT bigraph, we set e to prefer the set of resource nodes I
u
in the
profile of u [48]. The configuration for e is described as Eq. (9),
e
x
¼
1 if x ¼ uðUR bigraphÞ
1=jI
u
j if x 2 I
u
ðRT bigraphÞ
0 otherwise
8
>
<
>
:
ð9Þ
Probability Spreading (ProbS). Supposing that a kind of token is
initially located on items, each item will averagely distribute its
token to all neighboring users, and then each user will redistribute
the received token to all his/her collected items. Given a user-
resource bigraph, and suppose o
i
and o
u
to represent respectively
the number of users who have collected resource i and the number
of resources collected by user u. ProbS works by assigning items an
initial level of ’’tokens’’ denoted by the vector p (where p
i
is the
‘‘token’’ possessed by item i), and then redistributing it via the
transformation: p
0
¼ Wp, where
W
ij
¼
1
o
j
X
u2U
A
iu
A
ju
o
u
ð10Þ
is a column-normalized n n matrix. The adjacency matrix A
corresponds to the user-resource bigraph, where A
iu
¼ 1ifitemi is
collected by the user u,andA
iu
¼ 0otherwise[56]. Recommendations
for a given user u are obtained by setting the initial token vector p in
accordance with the items the user has previously collected. That is,
by setting p
i
¼ A
iu
. In this case, the initial token can be understood
as giving a unit recommending capacity to each collected item, and
the initial token vectors for different users have captured the
personalized preferences. The resulting recommendation list of
uncollected items is then sorted according to p
0
j
in descending order.
ProbS performs originally on the user-resource bipartite
(ProbS-UR). If we change the users to the tags, we can obtain a fur-
ther variation ProbS-RT [54].
3.3.3. Semantic models
Topic Models. A topic model is a type of statistical model for dis-
covering the abstract ‘‘topics’’ that occur in a collection of docu-
ments. Two typical models, Probability Latent Semantic Analysis
[21] and Latent Dirichlet Allocation [7] have recently gained many
H. Wu et al. / Knowledge-Based Systems 75 (2015) 124–140
127