Z. Chen et al.: KGC: A Review
the specific relationships that may have a common path,
and divide the relationships into different groups. By this
way, it improved the one-to-one modeling method of PRA
for training a separate classifier for each relationship. The
graph-based knowledge graph completion method has three
problems as follows: Firstly, the scalability is poor and the
memory usage is high, because for a group of entity pairs, this
type of algorithm requires enumerating paths to determine
whether there exists all possible relationships between the
entity pairs. Secondly, the number of paths is large, and
using the path as a model training feature may cause fea-
ture explosion. Finally, like the completion method based on
the probability graph model, the graph-based model is also
facing the problem of high complexity of large scale data
computation.
Traditional knowledge graph completion methods apply
the reasoning rules and the network structure of knowledge
graph. With the expansion of knowledge graph, the defects of
this kind of method are gradually manifest. Firstly, the expan-
sion of knowledge graph gradually reflects the sparsity of
data, increases the difficulty of extracting rules, and long
tail distribution entities associated knowledge is less. So the
above methods are greatly limited in the aspect of knowledge
graph completion; Secondly, the essence of knowledge graph
data is a kind of semantic network, in which entities and
relationships contain rich semantic information [50]. How-
ever, it is difficult to obtain high-quality knowledge graph
because the traditional knowledge graph representation meth-
ods cannot encode semantic information. Finally, the tradi-
tional knowledge graph completion method has the problem
of computational efficiency, high algorithm complexity, poor
portability and scalability. Based on this, the study of knowl-
edge graph completion has shifted to the stage of knowledge
representation learning.
B. MAIN METHODS OF KNOWLEDGE
GRAPH COMPLETION BASED ON
REPR ESENTATION LEARNING
Because the knowledge graph is a multi-relational graph com-
posed of entities (nodes) and relationships (different types
of edges), it is usually organized in the form of a network.
For example, the knowledge graph stored based on resource
description framework (RDF) [51] is represented in triples.
However, the knowledge graph representation based on net-
work exists lots of problems in application, mainly including
the following two aspects: First, the calculation efficiency.
In the knowledge representation based on network graph,
entities are expressed as different nodes. When calculating
the semantic or reasoning the relationships between entities,
it is necessary for specific application to design special graph
algorithm to implement this representation. This method is
poor in flexibility and scalability, which is difficult to meet
the demand of the current large-scale knowledge graph calcu-
lation. Second, data sparsity problem. Similar to other types
of large-scale data, large-scale knowledge graphs also follow
long-tail distribution. The entities and relationships of the
long-tail distribution face serious data sparsity problem [28].
For this problem, extensive attention has been turned to
knowledge representation learning [52]–[57] in recent years.
Through machine learning, knowledge representation learn-
ing aims to express semantic information such as entities
and relationships as dense low-dimensional real value vec-
tors in a continuous vector space, which not only preserves
the inherent graph structure of knowledge graph, but also
simplifies operations. Typical knowledge representation
learning techniques generally include the following three
parts: 1) Represent relationships and entities in a continu-
ous space; 2) Define the score function f
r
(h, t) to judge the
probability of the establishment of triples (h, r, t). The main
difference between models lies in the difference of the score
function; 3) Learn the representation of entities and relation-
ships, and solve the optimization problem of maximizing the
rationality of visible facts. Through the efficient computation
of semantic relations between entities and relationships in
low-dimensional space, the problem of data sparsity is effec-
tively solved, and the effect of knowledge graph completion
is significantly improved. The following will introduce the
knowledge graph completion methods based on different rep-
resentation learning models.
1) KNOWLEDGE GRAPH COMPLETION METHOD BASED ON
TRANSLATION MODEL
Translation model is the most representative classical method
in knowledge representation learning. In 2013, Mikolov et al.
proposed Word2Vec [54] algorithm for the first time, and
thus proposed the translation invariant phenomenon of word
vector, such as titanic-leonardodicaprio ≈ 2012-johncusack,
that is, distribution based word representation captures some
kind of same semantic relationship. According to the transla-
tion invariance phenomenon, Bordes et al. proposed the most
representative classical translation model TransE [55], and
led a large number of researchers into the study of Trans series
models, in which the representative improved models include
TransH [56], TransR [6] and TransD [57]. The main idea
behind the translation model is to treat the process of finding
valid triples as the translation operation of entities through
relationships, define the corresponding score function, and
then minimize the loss function to learn the representation
of entities and relationships.
Given a training set S consisting of triples (h, r, t), in the
head and tail entity h, t ∈ E, E is entity set, and r ∈ R, R is
relationship set. The main idea of TransE is that, if the triplet
(h, r, t) is true, then the sum of the vector representations of
head entity and relation is close to the vector representations
of the tail entity; otherwise, it is far away, that is, when the
triplet is formed, h+r≈t, as shown in FIGURE 1. From the
above ideas, the score function f
r
(h, t) = −
k
h + r − t
k
1
2
[55] of the TransE model can be obtained, which represents
the Euclidean distance between the head entity and the tail
entity in low-dimensional continuous space.
TransE model is efficient, concise and has good predic-
tion effect, but there are two problems: 1) TransE uses the
VOLUME 8, 2020 192439