医学图像报告生成：知识驱动的编码、检索与改写方法

需积分: 18 77 浏览量更新于2024-09-08 收藏 761KB PDF 举报

"这篇论文提出了一种新的方法——基于知识的编码、检索、改写（Knowledge-driven Encode, Retrieve, Paraphrase, KERP）来解决医学图像报告的生成问题。该方法结合了传统的知识和检索基础方法与现代学习基础方法，以实现准确且稳健的医学报告生成。KERP将医疗报告的生成分解为明确的医学异常图学习和后续的自然语言建模两个阶段。它首先使用编码模块将视觉特征转化为结构化的异常图，通过融入先验的医学知识；接着，检索模块根据检测到的异常检索文本模板；最后，改写模块根据具体案例重写这些模板。KERP的核心是一个称为图变换器（Graph Transformer, GTR）的通用实现单元，它可以动态地在不同领域的图结构数据（如知识图谱、图像和序列）之间转换高级语义。实验表明，该方法能生成具有结构化和鲁棒性的报告，包含精确的异常描述和可解释的注意力区域，在两个医学报告基准上达到了最先进的结果，同时在医疗异常和疾病分类准确性方面表现最佳，并提升了人类评估性能。" 在医学图像报告生成领域，KERP方法解决了将视觉信息与语言表达有效融合的问题，同时也考虑到了医学专业知识的融入。它通过编码模块利用深度学习技术将图像特征转化为一个表示异常的图结构，这种结构有助于模型理解图像中的关键信息。然后，通过检索模块，模型能够基于异常图检索到与之匹配的文本模板，这些模板提供了报告的基本框架。接下来，改写模块在模板基础上根据特定病例进行个性化修改，以生成符合实际情况的、连贯的报告内容。图变换器（GTR）是KERP的核心创新，它能够处理多域的图结构数据，包括知识图谱、图像和序列数据。GTR能够捕获不同数据源之间的复杂关系，从而在生成报告时实现跨模态的语义转换，确保生成的报告既具有结构化信息，又具备对异常情况的准确描述。实验结果证明，KERP在两个医学报告基准测试中表现优秀，不仅提高了医学异常和疾病分类的准确性，还改善了人类评估的性能，这意味着生成的报告不仅在技术指标上表现出色，而且在可读性和解释性方面也得到了认可。KERP为医学图像报告自动化生成提供了一个强大且灵活的框架，有望在未来进一步推动医学影像分析和诊断的自动化进程。

Knowledge-driven Encode, Retrieve, Paraphrase for

Medical Image Report Generation

Christy Y. Li

∗1

, Xiaodan Liang

†2

, Zhiting Hu

, Eric P. Xing

Duke University,

Carnegie Mellon University ,

Petuum, Inc

yl558@duke.edu, {xiaodan1,zhitingh}@cs.cmu.edu, eric.xing@petuum.com.

Abstract

Generating long and semantic-coherent reports to describe

medical images poses great challenges towards bridging vi-

sual and linguistic modalities, incorporating medical domain

knowledge, and generating realistic and accurate descrip-

tions. We propose a novel Knowledge-driven Encode, Re-

trieve, Paraphrase (KERP) approach which reconciles tradi-

tional knowledge- and retrieval-based methods with modern

learning-based methods for accurate and robust medical re-

port generation. Speciﬁcally, KERP decomposes medical re-

port generation into explicit medical abnormality graph learn-

ing and subsequent natural language modeling. KERP ﬁrst

employs an Encode module that transforms visual features

into a structured abnormality graph by incorporating prior

medical knowledge; then a Retrieve module that retrieves text

templates based on the detected abnormalities; and lastly, a

Paraphrase module that rewrites the templates according to

speciﬁc cases. The core of KERP is a proposed generic imple-

mentation unit—Graph Transformer (GTR) that dynamically

transforms high-level semantics between graph-structured

data of multiple domains such as knowledge graphs, images

and sequences. Experiments show that the proposed approach

generates structured and robust reports supported with ac-

curate abnormality description and explainable attentive re-

gions, achieving the state-of-the-art results on two medical

report benchmarks, with the best medical abnormality and

disease classiﬁcation accuracy and improved human evalu-

ation performance.

Introduction

Beyond the traditional image captioning task (Xu et al.

2015; Karpathy and Fei-Fei 2015; Rennie et al. 2017) that

produces single-sentence descriptions, generating long and

semantic-coherent stories or reports to describe visual con-

tents (e.g., images, videos) has recently attracted increas-

ing research interests (Liang et al. 2017; Huang et al. 2016;

Krause et al. 2017), and is posed as a more challeng-

ing and realistic goal towards bridging visual patterns with

human linguistic descriptions. Particularly, an outstanding

challenge in modeling long narrative from visual content is

∗

This work was done when Christy Y. Li was at Petuum, Inc.

†

Corresponding author.

 2019, Association for the Advancement of Artiﬁcial

to balance between knowledge discovery and language mod-

eling (Karpathy and Fei-Fei 2015). Current visual text gen-

eration approaches tend to generate plausible sentences that

look natural by the language model but poor at ﬁnding vi-

sual groundings. Although some approaches have been pro-

posed to alleviate this problem (Lu et al. 2018; Anderson et

al. 2018; Liang et al. 2017), most of them ignore the inter-

nal knowledge structure of the task at hand. However, most

real-world data and problems exhibit complex and dynamic

structures such as intrinsic relations among discrete enti-

ties under nature’s law (Taskar, Guestrin, and Koller 2004;

Hu et al. 2016; Strubell et al. 2018). Knowledge graph, as

one of the most powerful representations of dynamic graph-

structured knowledge (Mitchell et al. 2018; Bizer, Heath,

and Berners-Lee 2011), complements the learning-based ap-

proaches by explicitly modeling the domain-speciﬁc knowl-

edge structure and relational inductive bias. Knowledge

graph also allows incorporating priors, which is proven

useful for tasks where universal knowledge is desired or

certain constraints have to be met (Battaglia et al. 2017;

Liang, Hu, and Xing 2018; Hu et al. 2018; X. Wang 2018).

As an emerging task of long text generation of practi-

cal use, medical image report generation (Li et al. 2018;

Jing, Xie, and Xing 2018) must satisfy more critical proto-

cols and ensure the correctness of medical terminology us-

age. As shown in Figure 1, a medical report consists of a

ﬁnding section describing medical observations in details of

both normal and abnormal features, an impression or conclu-

sion sentence indicating the most prominent medical obser-

vation, and peripheral sections such as patients information

and indications. Among these sections, the ﬁnding section

is considered as the most important component and is ex-

pected to 1) cover contents of key relevant aspects such as

heart size, lung opacity, and bone structure; 2) correctly de-

tect any abnormalities and support with details such as the

location and shape of the abnormality; 3) describe potential

diseases such as effusion, pneumothorax and consolidation.

It is often observed that, to write a medical image report,

radiologists ﬁrst check a patient’s images for abnormal ﬁnd-

ings, then write reports by following certain patterns and

templates, and adjusting statements in the templates for each

individual case when necessary (Hong and Kahn 2013). To

mimic this procedure, we propose to formulate medical re-

port writing as a knowledge-driven encode, retrieve, para-

arXiv:1903.10122v1 [cs.CV] 25 Mar 2019

下载后可阅读完整内容，剩余9页未读，立即下载

Jayxp

粉丝: 6
资源: 137

医学图像报告生成：知识驱动的编码、检索与改写方法

struts2.3.4+spring3.2.0+hibernate4+hibernate_generic_dao 全注释+远程调用

最后「Lastly」-crx插件

Disk-Based Algorithms for Big Data 无水印pdf 0分

论文研究-Bend-Driven Global Routing for Good Manufacturability .pdf

The Cucumber Book - Behaviour-Driven Development for Testers and Developers.pdf

Test-Driven Java Development - Second Edition.pdf

Code-Driven Law NO, Normware SI.pdf

!Model-Driven Software Development with case study.pdf

Introducing Demand-Driven Replenishment (DDMRP).....zip

SWAGAN - A Style-based WAvelet-driven Generative Model.pdf

最新资源