A Relational Tucker Decomposition for Multi-Relational Link Prediction
where
Re(·)
extracts the real part of a complex number.
Likewise, HolE (Nickel et al., 2016b) is equivalent to Com-
plEx (Hayashi & Shimbo, 2017). HolE uses the scoring
function
s(i,k, j) = r
T
k
(e
i
? e
j
)
, where
?
refers to the circu-
lar correlation between
e
i
and
e
j
∈ R
d
e
(i.e.,
(e
i
? e
j
)
k
=
∑
d
r
t=1
e
it
e
j((k+t−2 mod d
r
)+1)
). The idea of using circular cor-
relation relates to associative memory (Nickel et al., 2016b).
HolE uses as
M
HolE
k
the circumvent matrix resulting from
r
k
. In the remainder of this paper, we use the formulation
using M
ComplEx
k
given above.
Analogy (Liu et al., 2017b).
Analogy uses block-
diagonal mixing matrices
M
Analogy
k
, where each block
is either (1) a real scalar
x
or (2) a
2 × 2
matrix of
form
x −y
y x
, where
x,y ∈ R
refer to entries of
r
k
and
each entry of
r
k
appears in exactly one block. We have
d
r
= d
e
. Analogy aims to capture commutative relational
structure: the constraint ensures that
M
Analogy
k
1
M
Analogy
k
2
=
M
Analogy
k
2
M
Analogy
k
1
for all
k
1
,k
2
∈ R
. Both DistMult (only
first case allowed) and ComplEx (only second case allowed)
are special cases of Analogy.
3. The Relational Tucker3 Decomposition
In this section, we introduce the Relational Tucker3 (RT) de-
composition, which decomposes the KG into entity embed-
dings, relation embeddings, and a core tensor. We show that
each of the existing bilinear models can be viewed (1) as an
unconstrained RT decomposition with a fixed (sparse) core
tensor or (2) as a constrained RT decomposition with fixed
relation embeddings. In contrast to bilinear models, the RT
decomposition allows parameters sharing across different
relations, and it decouples the entity and relation embedding
sizes. Our experimental study (4) suggests that both proper-
ties can be beneficial. We also introduce a sparse variant of
RT called SRT to approach the question of whether we can
learn sparsity patterns of the core tensor from the data.
In what follows, we make use of the tensor representation
of KGs. In particular, we represent knowledge graph
K
via a binary tensor
X ∈ {0, 1}
N×N×K
, where
x
i jk
= 1
if
and only if
(i,k, j) ∈ K
. Note that if
x
i jk
= 0
, we assume
that the truth value of triple
(i,k, j)
is missing instead of
false. The scoring tensor of a particular embedding model
m
of
K
is the tensor
S ∈ R
N×N×K
of all predicted scores,
i.e., with
s
i jk
= s
m
(i,k, j)
. We use
A
k
to refer to the
k
-th
frontal slice of a 3-way tensor
A
. Note that
X
k
contains the
data for relation
k
, and that scoring matrix
S
k
contains the
respective predicted scores. Generally, embedding models
aim to construct scoring tensors
S
that suitably approximate
X (Nickel et al., 2016a).
3.1. Definition
We start with the classicial Tucker3 decomposition (Tucker,
1966), focusing on 3-way tensors throughout. The Tucker3
decomposition decomposes a given data tensor into three
factor matrices (one per mode) and a core tensor, which
stores the weights of the three-way interactions. The de-
composition can be can be viewed as a form of higher-order
PCA (Kolda & Bader, 2009). In more detail, given a tensor
D ∈ R
I×J×K
and sufficiently large parameters
d
a
,d
b
,d
c
∈ N
,
the Tucker3 decomposition factorizes
D
into factor matri-
ces
A ∈ R
I×d
a
,
B ∈ R
J×d
b
,
C ∈ R
K×d
c
, and core tensor
H ∈ R
d
a
×d
b
×d
c
such that
d
i jk
= a
T
i
[H ×
3
c
k
]b
j
,
where
H×
3
c
k
∈ R
d
a
×d
b
refers to the mode-3 tensor product
defined as
H ×
3
c
k
=
d
c
∑
l=1
c
kl
H
l
,
i.e., a linear combination of the frontal slices of
H
. If
d
a
,d
b
,d
c
are smaller than
I,J,K
, core tensor
H
can be inter-
preted as a compressed version of
D
. It is well-known that
the CP decomposition (Kolda & Bader, 2009) corresponds
to the special case where
d
a
= d
b
= d
c
and
H
is fixed to
the
d
a
× d
a
× d
a
tensor with
h
i jk
= 1
iff
i = j = k
, else 0.
The RT decomposition, which we introduce next, allows us
to view existing bilinear models as decompositions with a
fixed core tensor as well.
In particular, in KG embedding models, we associate each
entity with a single embedding, which we use to represent
the entity in both subject and object position. The relational
Tucker3 (RT) decomposition applies this approach to the
Tucker3 decomposition by enforcing
A = B
. In particular,
given embedding sizes
d
e
and
d
r
, the RT decomposition is
parameterized by an entity embedding matrix
E ∈ R
N×d
e
,
a relation embedding matrix
R ∈ R
K×d
r
, and a core tensor
G ∈ R
d
e
×d
e
×d
r
. As in the standard Tucker3 decomposition,
RT composes mixing matrices from the frontal slices of the
core tensor, i.e.,
M
RT
k
=
d
r
∑
l=1
r
kl
G
l
.
The scoring tensor has entries
s
i jk
= e
T
i
(G ×
3
r
k
)e
j
= e
T
i
M
RT
k
e
j
. (2)
Note that the mixing matrices for different relations share
parameters through the frontal slices of the core tensor.
The RT decomposition can represent any given tensor, i.e.,
the restriction on a single embedding per entity does not
limit expressiveness. To see this, suppose that we are given
a Tucker3 decomposition
A,B,C,H
of some tensor. Now