Knowledge Graph Embedding for Link Prediction: A Comparative Analysis 9
distance functions
d
L1
,
d
L2
and
d
eL2
, corresponding to L1, L2 and squared L2 norm respectively (we report in Table 1
the scoring function with the extended form of d
L1
).
RotatE
[
55
] represents relations as rotations in a complex latent space, with
h
,
r
and
t
all belonging to
C
d
. e
r
embedding is a rotation vector: in all its elements, the complex component conveys the rotation along that axis, whereas
the real component is always equal to 1. e rotation
r
is applied to
h
by operating an element-wise product (once
again noted with
in 1). L1 norm is used for measuring the distance from
t
. e authors demonstrate that rotation
allows to model correctly numerous relational paerns, such as symmetry/anti-symmetry, inversion and composition.
3.3 Deep Learning Models
Deep Learning Models use deep neural networks to perform the LP task. Neural Networks learn parameters such
as weights and biases, that they combine with the input data in order to recognize signicant paerns. Deep neural
networks usually organize parameters into separate layers, generally interspersed with non-linear activation functions.
In time, numerous types of layers have been developed, applying very dierent operations to the input data. Dense
layers, for instance, will just combine the input data
X
with weights
W
and add a bias
B
:
W × X + B
. For the sake of
simplicity, in the following formulas we will not mention the use of bias, keeping it implicit. More advanced layers
perform more complex operations, such as convolutional layers, that learn convolution kernels to apply to the input
data, or recurrent layers, that handle sequential inputs in a recursive fashion.
In the LP eld, KG embeddings are usually learned jointly with the weights and biases of the layers; these shared
parameters make these models more expressive, but potentially heavier, harder to train, and more prone to overing.
We identify three groups in this family, based on the neural architecture they employ: (i) Convolutional Neural Networks,
(ii) Capsule Neural Networks, and (iii) Recurrent Neural Networks.
3.3.1 Convolutional Neural Networks. ese models use one or multiple convolutional layers [
33
]: each of these
layers performs convolution on the input data (e.g. the embeddings of the KG elements in a training fact) applying
low-dimensional lters
ω
. e result is a feature map that is usually then passed to additional dense layers in order to
compute the fact score.
ConvE
[
11
] represents entities and relations as one-dimensional
d
-sized embeddings. When computing the score of
a fact, it concatenates and reshapes the head and relation embeddings
h
and
r
into a unique input
[h
;
r ]
; we dub the
resulting dimensions
d
m
× d
n
. is input is let through a convolutional layer with a set
ω
of
m × n
lters, and then
through a dense layer with
d
neurons and a set of weights
W
. e output is nally combined with the tail embedding
t
using dot product, resulting in the fact score. When using the entire matrix of entity embeddings instead of the
embedding of just the one target entity t, this architecture can be seen as a classier with |E| classes.
ConvKB
[
42
] models entities and relations as same-sized one-dimensional embeddings. Dierently from ConvE, given
any fact
hh, r , ti
, it concatenates all their embeddings
h
,
r
and
t
into a
d ×
3 input matrix
[h
;
r
;
t ]
. is input is passed
to a convolutional layer with a set
ω
of
T
lters of shape 1
×
3, resulting in a
T ×
3 feature map. e feature map is let
through a dense layer with only one neuron and weights
W
, resulting in the fact score. is architecture can be seen as
a binary classier, yielding the probability that the input fact is valid.
ConvR
[
25
] represents entity and relation embeddings as one-dimensional vectors of dierent dimensions
d
e
and
d
r
.
For any fact
hh, r , ti
,
h
is rst reshaped into a matrix of shape
d
e
m
, d
e
n
, where
d
e
m
× d
e
n
= d
e
.
r
is then reshaped
Manuscript submied to ACM