深度解析：知识增强预训练模型的进展与未来

需积分: 30 102 浏览量更新于2024-07-09 收藏 1.91MB PDF 举报

随着深度学习技术的发展，预训练模型已成为自然语言处理领域的研究焦点。西安电子科技大学的研究团队在《知识增强的预训练模型》一文中，对这一领域进行了深入探讨。预训练模型采用自监督学习方法，通过在大规模文本语料库中学习上下文相关的词表示，展现出强大的性能，尤其是在经过微调后。然而，它们的不足之处在于鲁棒性较差，且缺乏可解释性，这限制了其在实际应用中的可靠性。知识增强的预训练模型（Knowledge Enhanced Pre-trained Models, KEPTMs）旨在弥补这些局限。KEPTMs通过将外部知识融入到模型中，提升了模型的理解和逻辑推理能力，同时在一定程度上增强了模型的可解释性。这使得模型能够在处理复杂任务时，不仅准确，而且能够提供一定程度的决策依据，增强了用户对模型决策的信任度。本文的综述内容分为三大部分：首先，它回顾了预训练模型的发展历程和核心原理，包括诸如BERT、ELMo等早期模型的演变，以及Transformer架构的兴起，这些都为后续的知识增强奠定了基础。其次，从不同角度对现有的KEPTMs进行了系统分类，可能包括基于知识图谱的方法（如K-BERT、ERNIE）、基于规则或逻辑的方法（如KnowBert、K-ELECTRA）和混合型模型（结合了多种知识来源），以展示其多样性与创新。最后，对未来的研究方向进行了展望。可能涉及以下几个方面：一是如何更有效地融合知识，提高知识注入的效率和模型的整体性能；二是提升模型的可解释性，让知识注入过程更加透明；三是探索跨模态知识增强，如结合图像和文本的多模态预训练；四是针对特定领域的KEPTMs，如医疗、法律等，定制化的知识增强策略；五是研究如何应对不断变化的自然语言环境，保持模型的动态适应性和持续学习能力。《知识增强的预训练模型》这篇论文为读者提供了一个全面理解KEPTMs的框架，对于理解预训练模型的最新进展、挑战和未来发展方向具有重要参考价值。在这个快速发展的领域，KEPTMs的持续研究和发展将对NLP技术的进步产生深远影响。

展开

knowledge can be much easier to integrate with the neu-

ral network based models after knowledge representation

learning.

Translational Distance Models With distance-based

scoring functions, this type of models measure the plausi-

bility of a fact as the distance between the two entities after

a translation carried out by the relation. Inspired by linguis-

tic regularities in [38], TransE [39] represents entities and

relations in d-dimension vector space so that the embedded

entities h and t can be connected by translation vector r,

i.e., h + r ≈ t when (h, r, t) holds. To tackle this problem of

insufﬁciency of a single space for both entities and relations,

TransH [40] and TransR [41] allows an entity to have distinct

representations when involved in different relations. TransH

introduces relational hyperplanes assuming that entities

and relations share the same semantic space, while TransR

exploits separated space for relations to consider different

attributes of entities. TransD [42] argues that entities serve

as different types even with the same relations and construct

dynamic mapping matrices by considering the interactions

between entities and relations. Owning to heterogeneity

and imbalance of entities and relations, TranSparse [43]

simpliﬁes TransR by enforcing sparseness on the projection

matrix.

Semantic Matching Models Semantic matching models

measure plausibility of facts by matching latent semantics

of entities and relations with similarity-based scoring func-

tions. RESCAL [44] associates each entity and relation with

a vector and matrix ,repectively. The score of a fact (h, r, t)

is deﬁned by a bilinear function. To decrease the computing

complexity, DistMult [45] simpliﬁes RESCAL by restricting

relation to diagonal matrices. Combining the expressive

power of RESCAL with the efﬁciency and simplicity of Dist-

Mult, HolE [46] composes the entity representations with

the circular correlation operation, and the compositional

vector is then matched with the relation representation to

score the triplet. Unlike models above, SME [47] conducts

semantic matching between entities and relation using neu-

ral network architectures. NTN [48] combines projected

entities with a relational tensor and predicts scores after a

relational linear output layer.

Graph Neural Network Models The above models em-

bed entities and relations by only facts stored as a collec-

tion of triplets, while graph neural network based models

take account of the whole structure of the graph. Graph

convolutional network (GCN) is ﬁrst proposed in [49] and

has been an effective tool to create node embeddings after

continuous efforts [50], [51], [52], [53], which aggregates

local information in the graph neighborhood for each node.

As the extension of graph convolutional networks, R-GCN

[54] is developed to deal with the highly multi-relational

data characteristic of realistic knowledge bases. SACN [55]

employs an end-to-end network learning framework where

the encoder leverages graph node structure and attributes,

and the decoder simpliﬁes ConvE [56] and keeps the trans-

lational property of TransE. Following the same framework

of SACN, Nathani et al. [57] propose an attention-based

feature embedding that captures both entity and relation

features in the encoder. Vashishth et al. [58] believe that the

combination of relations and nodes should be considered

comprehensively during the message transmission. There-

fore they propose CompGCN that leverages various entity-

relation composition operations from knowledge graph em-

bedding techniques and scales with the number of relations

to embed both nodes and relations jointly.

3 OVERVIEW OF KNOWLEDGE ENHANCED PRE-

TRAINED MODELS

3.1 The Motivation of Knowledge Enhanced Pre-

trained Models

The recent progressive development of pre-trained models

has attracted much attention from researchers. However,

despite the great effort invested in their creation, it suffers

from inability of understanding the deep semantics of text

and logical reasoning. In addition to that, the knowledge

learned from the model exists in parameters and is unin-

terpretable. Poor robustness and the lack of interpretability

can be greatly alleviated by infusing entity features and

factual knowledge of KGs. We name the models that inte-

grate knowledge through retrieval or injection as KEPTMs.

Most of the pre-trained models introduced in this paper

focus on the leverage of linguistic knowledge and world

knowledge that belongs to factual knowledge or conceptual

knowledge deﬁned in Section 2.2.1. This kind of knowledge

provides rich information of entities and relations for the

pre-trained model, which promotes the capability of deep

understanding and reasoning of pre-trained models sharply.

3.2 A Taxonomy of Knowledge Enhanced Pre-trained

Models

To compare and analyze existing KEPTMs, We ﬁrst cat-

egorize them into three groups according to the type of

injected knowledge: entity enhanced pre-trained models,

triplet enhanced pre-trained models and other knowledge

enhanced pre-trained models.

For entity enhanced pre-trained models, all of these

models store knowledge and language information within

parameters of the pre-trained model and belong to coupled-

based KEPTMs. We further classify them into entity features

fused and knowledge graph supervised pre-trained models

according to the method of entity injection.

For triplet enhanced pre-train models, we divide

them into coupled-based and decoupled-based KEPTMs

by whether coupling between triplets and corpus. Since

coupled-based KEPTMs entangle word embeddings and

knowledge embeddings during pre-training, it fails to main-

tain the interpretability of symbolic knowledge. Further,

we categorize coupled-based KEPTMs into three groups:

embedding combined, data structure uniﬁed KEPTMs, and

joint training KEPTMs according to the method of triplets

infusion. As for decoupled-based KEPTMs, they preserve

the embeddings of knowledge and language separately and

thus introduce the interpretability of symbolic knowledge.

We divide it into retrieval-based KEPTMs because it utilizes

knowledge by retrieving relevant information.

Other knowledge enhanced models also can be catego-

rized into coupled-based and decoupled-based KEPTMs.

We further divide them into joint training and retrieval-

based KEPTMs.

下载后可阅读完整内容，剩余15页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

努力+努力=幸运

粉丝: 17

深度解析：知识增强预训练模型的进展与未来

03 Pre-trained Models for Natural Language Processing A Survey.pdf

让pre标签自动换行示例代码

vim-enhanced-7.2.025-2.fc10.i386.rpm

vim-enhanced-8.0.1763-16.el8.aarch64.rpm

vim-enhanced-8.0.1763-15.el8.aarch64.rpm

vim-enhanced-8.0.1763-15.el8.ppc64le.rpm

vim-enhanced-8.0.1763-16.el8.ppc64le.rpm

vim-enhanced-7.4.160-5.el7.x86_64.rpm

vim-enhanced-8.0.1763-16.el8.x86_64.rpm

vim-enhanced-7.4.629-7.el7.x86_64.rpm

最新资源