利用概念上下文嵌入提升医疗文本理解

需积分: 12 13 浏览量更新于2024-08-05 收藏 340KB PDF 举报

“Learning Conceptual-Contextual Embeddings for Medical Text.pdf” 这篇论文是关于人工智能在医学文本理解领域的应用，尤其关注如何将结构化的外部知识融入到文本表示中。标题“Learning Conceptual-Contextual Embeddings for Medical Text”指出，研究的重点在于学习概念-上下文（Conceptual-Contextual, CC）嵌入，这是一种新的文本表示方法，特别针对医学领域。作者们提出了一种名为CC嵌入的上下文文本表示模型，这个模型的独特之处在于它不仅考虑了实体的嵌入，还把知识图谱编码到上下文模型中。与传统的实体嵌入方法不同，CC嵌入能够更灵活地应用于各种任务，类似于预训练的语言模型。通过这种方式，模型可以有效利用像UMLS（Unified Medical Language System）这样的大型数据库，利用语义泛化能力来编码知识。在论文中，作者们强调了外部知识对自然语言理解任务的重要性。在医疗领域，这种理解尤其关键，因为医学文本通常包含大量专业术语和复杂的上下文关系。实验部分展示了该模型在电子健康记录（EHRs）和医疗文本处理基准测试中的出色表现，表明CC嵌入能够显著提升监督式医学任务的性能。此外，由于CC嵌入可以轻松重用于多种任务，这使得它们成为一种通用的工具，对于处理医学数据的多样性和复杂性非常有用。这一成果可能对医学信息提取、疾病诊断、药物发现以及患者管理等多个方面产生积极影响，有助于提高医疗决策的准确性和效率。 “Learning Conceptual-Contextual Embeddings for Medical Text”这篇论文探讨了一种创新的方法，将知识图谱与文本表示相结合，以增强医学文本的理解能力。这种结合不仅提高了模型的性能，也拓宽了其在医疗AI领域的应用潜力。

Learning Conceptual-Contextual Embeddings for Medical Text

Xiao Zhang

∗

Dejing Dou

3,4

Ji Wu

1,2

Department of Electronic Engineering, Tsinghua University

Institute for Precision Medicine, Tsinghua University

Department of Computer and Information Science, University of Oregon

Baidu Research

xzhang19@mails.tsinghua.edu.cn

dou@cs.uoregon.edu, doudejing@baidu.com

wuji ee@mail.tsinghua.edu.cn

Abstract

External knowledge is often useful for natural language un-

derstanding tasks. We introduce a contextual text representa-

tion model called Conceptual-Contextual (CC) embeddings,

which incorporates structured knowledge into text represen-

tations. Unlike entity embedding methods, our approach en-

codes a knowledge graph into a context model. CC embed-

dings can be easily reused for a wide range of tasks in a

similar fashion to pre-trained language models. Our model

effectively encodes the huge UMLS database by leveraging

semantic generalizability. Experiments on electronic health

records (EHRs) and medical text processing benchmarks

showed our model gives a major boost to the performance

of supervised medical NLP tasks.

Introduction

External knowledge is often useful for language understand-

ing tasks. Especially in specialized domains like medicine,

it is unlikely to attain human-level performance in text un-

derstanding without referring to external domain knowl-

edge. Ontologies and knowledge graphs are the most com-

mon forms of domain knowledge, but due to their structured

nature, it is not straightforward to incorporate them with

representation-based neural models.

Current approaches usually bridge text and knowledge

graphs with retrieval. Triplets or entities are retrieved based

on occurrences of the text tokens in the entity descriptions.

After retrieval, triplets can be treated as text sequences and

be provided to the model as an extra input (Mihaylov and

Frank 2018). Another method is to use the corresponding

entity embeddings from a graph embedding model trained

on knowledge graphs (Huang et al. 2019). However one still

needs to deal with the aligning issue between entity embed-

dings and text representations.

In this paper, we take a novel approach which takes exter-

nal knowledge into the realm of text representation learning.

Word embeddings models like skip-gram (Mikolov et al.

2013a) and contextual embedding models like BERT (De-

vlin et al. 2018) have proved the crucial role of good text

∗

Work done while visiting the University of Oregon

 2020, Association for the Advancement of Artiﬁcial

representations in NLP tasks. Our model aims to incorporate

external knowledge into text representations, which makes

it easy to apply external knowledge and makes it robust to

variations of expression in text.

Our model, which we termed Conceptual-Contextual

(CC) Embeddings, is a contextual text representation model

similar to BERT. Instead of providing general text represen-

tations, CC embeddings are speciﬁcally designed to be “con-

cept aware.” The model is trained to recognize concept and

entity names in text and produce representations of those

concepts and entities. Knowledge from knowledge graphs is

encoded in the representations, which can be easily utilized

in NLP tasks. Like other contextual representation models,

CC embedding model can be used to generate embeddings

as features or ﬁne-tuned for a supervised learning task.

The rest of the paper is organized as follows: we ﬁrst for-

mulate our approach and discuss why it is particularly rele-

vant for the medical domain. Then we detail our model and

the process of encoding a large knowledge graph into con-

textual representations. Finally we evaluate on several tasks

to validate the effects of our CC embeddings.

Methodology

Model

… in premature infants

cortisone was predominant

compared with cortisol …

Cardiovascular involvement

in rheumatoid arthritis (RA)

is increasingly observed …

C0003873

C0010137

may_prevent

+ =

concept_a concept_brelation

“Cortisone” “Rheumatoid Arthritis”

Figure 1: Encoding concept mentions in text

The core component of CC embedding model is an en-

coder which encodes structured knowledge. The encoder

takes a written form of a concept as input, and outputs a

vector representation of the concept. The idea is illustrated

in Figure 1.

arXiv:1908.06203v3 [cs.CL] 12 Mar 2020

下载后可阅读完整内容，剩余8页未读，立即下载

DeepLearning小舟

粉丝: 2400
资源: 57

利用概念上下文嵌入提升医疗文本理解

Machine Learning for Developers-Packt Publishing(2017).pdf

Role-basedAccess Control by RAVl S. SANDHU

Domain-Driven Design Complexity In Software.pdf

Database Design Using Entity-Relationship Diagrams, Second Edition.pdf

2-GMS MODFLOW-Conceptual Model Approach.pdf

数据库系统--包括题目和答案.doc

PyPI 官网下载 | UCCA-1.0.125.tar.gz

CCNA.Cloud.CLDFND.210-451.Official.Cert.Guide.1587147009

概念整合理论关照下的计算机英语隐喻汉译研究--翻译学硕士学位论文.doc

PyPI 官网下载 | mmf-1.0.0rc9.tar.gz

最新资源