ERNIE: Enhanced Language Representation with Informative Entities
Zhengyan Zhang
1,2,3∗
, Xu Han
1,2,3∗
, Zhiyuan Liu
1,2,3†
, Xin Jiang
4
, Maosong Sun
1,2,3
, Qun Liu
4
1
Department of Computer Science and Technology, Tsinghua University, Beijing, China
2
Institute for Artificial Intelligence, Tsinghua University, Beijing, China
3
State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China
4
Huawei Noahs Ark Lab, Huawei Technologies
Abstract
Neural language representation models such
as BERT pre-trained on large-scale corpora
can well capture rich semantic patterns from
plain text, and be fine-tuned to consistently im-
prove the performance of various NLP tasks.
However, the existing pre-trained language
models rarely consider incorporating knowl-
edge graphs (KGs), which can provide rich
structured knowledge facts for better language
understanding. We argue that informative en-
tities in KGs can enhance language represen-
tation with external knowledge. In this pa-
per, we utilize both large-scale textual cor-
pora and KGs to train an enhanced language
representation model (ERNIE), which can
take full advantage of lexical, syntactic, and
knowledge information simultaneously. The
experimental results have demonstrated that
ERNIE achieves significant improvements on
various knowledge-driven tasks, and mean-
while is comparable with the state-of-the-
art model BERT on other common NLP
tasks. The source code of this paper can
be obtained from https://github.com/
thunlp/ERNIE.
1 Introduction
Pre-trained language representation models, in-
cluding feature-based (Mikolov et al., 2013; Pen-
nington et al., 2014; Peters et al., 2017, 2018) and
fine-tuning (Dai and Le, 2015; Howard and Ruder,
2018; Radford et al., 2018; Devlin et al., 2018)
approaches, can capture rich language informa-
tion from text and then benefit many NLP appli-
cations. BERT (Devlin et al., 2018), as one of the
most recently proposed models, obtains the state-
of-the-art results on various NLP applications by
simple fine-tuning, including named entity recog-
nition (Sang and De Meulder, 2003), question
∗
indicates equal contribution
†
Corresponding author: Z.Liu(liuzy@tsinghua.edu.cn)
is_ais_a
Song Book
author
composer
Bob Dylan
Chronicles:
Volume One
Blowin’ in the wind
Songwriter Writer
is_a
is_a
Bob Dylan wrote Blowin’ in the Wind in 1962, and wrote Chronicles: Volume One in 2004.
Figure 1: An example of incorporating extra
knowledge information for language understand-
ing. The solid lines present the existing knowl-
edge facts. The red dotted lines present the facts
extracted from the sentence in red. The blue dot-
dash lines present the facts extracted from the sen-
tence in blue.
answering (Rajpurkar et al., 2016; Zellers et al.,
2018), natural language inference (Bowman et al.,
2015), and text classification (Wang et al., 2018).
Although pre-trained language representation
models have achieved promising results and
worked as a routine component in many NLP
tasks, they neglect to incorporate knowledge in-
formation for language understanding. As shown
in Figure 1, without knowing Blowin’ in the Wind
and Chronicles: Volume One are song and book
respectively, it is difficult to recognize the two oc-
cupations of Bob Dylan, i.e., songwriter and
writer, in the entity typing task. Furthermore,
it is nearly impossible to extract the fine-grained
relations, such as composer and author in
the relation classification task. For the existing
pre-trained language representation models, these
two sentences are syntactically ambiguous, like
“UNK wrote UNK in UNK”. Hence, considering
rich knowledge information can lead to better lan-
guage understanding and accordingly benefits var-
ious knowledge-driven applications, e.g. entity
typing and relation classification.
arXiv:1905.07129v1 [cs.CL] 17 May 2019