3
The problem is challenging, as it is difficult to acquire
sufficiently labeled data to train an effective machine learning
model. To deal with this issue, we desire to have a powerful
unsupervised or semi-supervised model. Second, there is
also the name disambiguation issue. For example, there are
four different “entropy” entries in Wikipedia, and how to link
different ones with those in academic graphs is challenging.
Finally, the model needs to handle the scalability, as to train
and deploy a model to deal with thousands of millions of
concepts is not an easy task.
To deal with the aforementioned issues, especially the
first one, we further clarify the supervised and unsupervised
(or self-supervised) embedding learning for concept linking
in the following definition.
Definition 3. Embedding Learning for Concept Linking
Given m knowledge bases represented as m graphs
G
p
=
{C
p
,R
p
,A
p
}(p =1, ··· ,m)
, a embedding function
f : c|G !
R
d
is learned such that for each concept
c
(p)
i
2 C
p
, embedding
v
(p)
i
= f(c
(p)
i
|G
p
)
could be efficiently utilized to recover the full
concept linkings
L = {(c
(p)
i
,c
(q)
j
)|c
(p)
i
2 C
p
,c
(q)
j
2 C
q
,p6= q}
in:
1) Supervised
setting: part of
L
is provided as the training set
for training f.
2) Unsupervised (Self-supervised)
setting: none of
L
is
provided for training f.
3THE SELFLINKG FRAMEWORK
In this section, we present the self-supervised embedding
learning framework—SelfLinKG—for linking concepts across
knowledge bases. We will first discuss the motivation of
SelfLinKG and then introduce its two components.
3.1 Motivation
In related fields of concept linking, such as entity align-
ment, embedding-based methods are generally based on
supervised learning. Supervised learning has achieved great
success in the last decade, but it suffers from heavy depen-
dency on manual labels and poor scalability on unseen data.
These problems are especially fatal to large-scale concept
linking and entity alignment. A large amount of manually
labeled data is too expensive, and to make the linking system
online, we need to make the algorithm scalable.
Despite the drawbacks of supervised learning, however,
previously people have few choices but to choose it because
of two important reasons as shown in Figure 2:
1) Lack of embedding consistency.
For concepts in differ-
ent KBs, their representations are located in different
and inconsistent embedding spaces (just like two people
using two languages). To make their embeddings consis-
tent, we can either use a supervised classifier to bridge
the gap (a translator [16]) or let them fall into the same
embedding space by anchor nodes (both turn to use the
third language [2], [18], [31], [45]). Both methods require
external supervision.
2) Lack of training objective.
In supervised learning,
labels serve as objectives for encoders to draw near
positive samples and push away negative samples.
Without labels, such a goal seems to be impossible
because we can not draw near positive pairs.
Fig. 2: Motivation of SelfLinKG from perspectives of embed-
ding consistency and training objective.
Are there any means to cope with these problems, or part
of them, without labels? Fortunately, recent breakthroughs
in self-supervised learning shed light on this question.
In terms of embedding consistency, if KBs are in the same
language, we can leverage the inherent embedding space of
it. Instead of using word embeddings trained separately on
different KBs, pre-trained language models such as BERT can
provide a unified initial embedding space for concepts from
different KBs. During the training, a shared encoder that yields
embeddings for concepts from different KBs will further
ensure the consistency.
In terms of training objectives, without labels, we cannot
draw near positive sample pairs. However, there are always
abundant negative samples. If we can push away negative
samples from each other as much as possible, it equals we
relatively draw near positive ones that share similarity to
some extent. The instance discrimination pretext task with
contrastive loss are born for that purpose.
To sum up, we propose SelfLinKG, a concept learn-
ing framework to deal with the large-scale heterogeneous
concept linking problem without an arduously expensive
process for producing massive labeled data. We propose
to leverage self-supervised learning to learn the intrinsic
relations between concepts across the two knowledge bases,
which also help mitigate the scalability issue for handling
large-scale data. In the following sections, we will introduce
two components that SelfLinKG comprises of in details: 1)
local attention-based encoding and 2) global momentum
contrastive learning. Figure 3 illustrates the architecture of
SelfLinKG.
Local Attention-based Encoding. The local attention-based
encoding aims to tackle the data heterogeneity and map both
data into the same latent space at both entity-level and graph-
level. For entity-level, both semantic information and struc-
tural information are involved. We design a heterogeneous
graph-attention-based encoder to aggregate information from
the taxonomy structures (both hierarchy and neighborhood).
For graph-level, we formulate taxonomies, encyclopedias,
and knowledge graphs into unified attributed graphs with
two types of relations (hyponym and related) to simplify the
problem.
Global Momentum Contrastive Learning. After encoding
concepts’ into vectors in the first step, we propose to use