文本摘要：探索抽象文本总结的各种方法

版权申诉

188 浏览量更新于2024-09-11 收藏 225KB PDF 举报

本文档《【综述】文本摘要》由Arun Krishna Chitturi等人在2019年11月至12月期的《国际计算机科学与工程高级趋势杂志》上发表，探讨了抽象文本摘要的各种方法及其应用。文本摘要作为自然语言处理的核心组成部分，其目的是从长篇文章或文本中提炼出独特且关键的信息，以便快速理解和消化大量信息，尤其是在商业领域，如分析客户反馈时，人工处理效率低下，这就催生了自动文本摘要的研究。当前研究重点是抽象文本摘要，它与传统的总结方式不同，不依赖原文逐句复述，而是通过机器学习算法生成意义连贯、精炼的摘要。该综述论文主要分为以下几个部分： 1. **介绍**：指出在当今信息时代，文本摘要的重要性日益凸显，尤其是在处理大数据和自动化处理的需求下。抽象文本摘要的目标是生成简洁而保留原文核心内容的摘要。 2. **方法与技术**：文章详细介绍了多种方法和策略，包括但不限于： - **编码器-解码器模型**：这是一种常见的神经网络架构，通过编码原始文本的信息，然后通过解码器生成摘要，如Transformer、RNN（循环神经网络）等。 - **多文档摘要**：涉及对多个独立文档中相关信息的整合和提炼，需要处理文本间的关联性。 - **深度学习方法**：如深度置信网络(DBN)、卷积神经网络(CNN)以及注意力机制，这些技术用于捕捉文本中的关键特征和上下文信息。 3. **研究成果与评估**：论文回顾了近年来在抽象文本摘要领域的研究成果，包括评价指标（如ROUGE、BLEU等）、数据集和实验结果，以便于比较不同方法的有效性和性能。 4. **关键词**：文章的关键字揭示了研究的焦点，如抽象化摘要、解码器、编码器和多文档摘要，强调了这些术语在现代文本处理中的核心地位。 5. **结论与未来方向**：最后，作者总结了当前研究的亮点和挑战，并展望了未来可能的研究趋势，如更高效的模型、跨语言摘要、以及结合深度学习和规则的混合方法等。该综述提供了对抽象文本摘要技术的全面概述，对于理解和开发自动化文本处理工具，以及优化信息检索和理解过程具有重要意义。通过阅读这篇论文，读者能够深入了解不同方法的应用场景、优缺点以及潜在的发展趋势。

Arun Krishna Chitturi et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(6), November - December 2019, 2956- 2964

2956



ABSTRACT

Text summarization is the core aspects of Natural Language

processing. Summarized text should consist of unique

sentences. It is used in many situations in today’s Information

technological word, one of the best examples is in

understanding customer feedbacks in companies. This job can

be done by humans, but if the text or data that has to be

summarized then it will consume lot of time and work force.

This situation lead to birth of different approaches in

summarization. This paper addresses and concentrates on

various methods and approaches and their results in

abstractive text summarization. This survey gives an insight

about different types of text summarization and various

methods used in abstractive text summarization in recent

developments.

Key words : abstractive summarization, decoder, encoder,

multi document summarization

1. INTRODUCTION

Summarization is very well useful to us in today’s world.

The main aim of abstractive text summarization is to produce

shortened version of input text with relevant meaning[7]. The

adjective abstractive is utilized because it denotes that the

generated summary is not a combination or selection of some

repeated sentences, but it a paraphrasing of core contents of

the input document [8]. Abstractive summarization is a very

difficult problem apart from Machine translation. The main

challenge in ATS is to compress the matter of input document

in an optimized way so that the main concepts of the

document are not missed [8]. In current technologically

advancing world, volumes of data is increasing and it is very

difficult to read the required data in short time[6]. It is a pretty

task to collect the required information and then convert into

summarized form. Therefore, text summarization came into

demand. Summarized text saves time and helps in avoiding

retrieving massive text. Abstractive Text summarization can

be combined with numerous intelligent systems on the basis

of NLP technologies like information retrieval, question

answering, and text classification to find the particular

information [9]. If latent structure information of the

summaries can be incorporated into abstractive

summarization model, then the quality of summaries

generated can be improved [10]. In some research works,

topic models are used to capture the latent information from

the input paragraph or documents. Despite having many

hurdles abstractive text summarization faces core issues like

(i) Neural sequence-to-sequence models which try to produce

generic summary, which include mostly used phrases (ii) The

generated summaries are less readable and are not

grammatically perfect [11]. Summarization is divided into

following types: (a) Extractive text summarization (b)

Abstractive text summarization [6]. Extractive

summarization extracts the frequently used or only precise

phrases without modifying them and generates the summary.

Whereas abstractive summarization generates new sentences

and also optimally decreases the length of the document.

Abstractive is better and qualitative than extractive as it takes

data from multiple documents and then generate precise

information of summary. Abstractive summarization is again

achieved in two ways. They are: (a) Structure based approach

(b) Semantic based approach. Neural network models on the

basis of encoder decoder for machine translation achieved

good ROGUE scores [12]. Abstractive approaches generate

summary similar to summary generated by humans but they

are more expensive [13]. On the basis of current state of RNN

in Attentive RNN the encoder computes score over the input

sentences [14]. The main problem in ATS are (a) Long

document summarization (b) Abstractive metric (c)

Controlling output length. F1 scores are evaluated generally

using ROUGE metrics [15]. Recall-Oriented Understudy for

Gisting Evaluation (ROUGE) metric was proposed by (Lin,

2004) [24]. Named Entity Recognition is also one of the core

application in NLP which helps in removing ambiguity [28].

Information Retrieval is also highly difficult and it requires

quality documents[37].

2. SURVEY

2.1 Semantic Link Network For Summarization[1]:

SLN is a semantics self-formulated for semantically

organizing resources to support advanced information

services like Abstractive Text Summarization [1]. According

the author the semantic link network, which is used in

Abstractive text summarization, has following important

components:

Survey on Abstractive Text Summarization using various approaches

Arun Krishna Chitturi

, Saravanakumar Kandaswamy

Vellore Institute of Technology, Vellore, India, chitturiarunkrishna@gmail.com

Professor at Vellore Institute of Technology, Vellore, India, ksaravanakumar@vit.ac.in

ISSN 2278-

3091

Volume 8, No.6, November – December 2019

International Journal of Advanced Trends in Computer Science and Engineering

Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse45862019.pdf

https://doi.org/10.30534/ijatcse/2019/45862019

下载后可阅读完整内容，剩余8页未读，立即下载

Fun_He

粉丝: 19
资源: 104

文本摘要：探索抽象文本总结的各种方法

综述：文本摘要.pdf

基于深度学习的文本自动生成技术研究综述.pdf

汉语分词技术综述.pdf

多模态深度学习综述.pdf

视频数据挖掘技术综述.pdf

预训练语言模型的应用综述.pdf

从Word2Vec到BERT：上下文嵌入 (Contextual Embedding) 最新综述论文.pdf

基于深度学习的标题生成方法综述.pdf

机器生成语言的质量评价方法综述.pdf

精细金属掩模板(FMM)行业研究报告 显示技术核心部件FMM材料产业分析与市场应用

最新资源

精细金属掩模板(FMM)行业研究报告显示技术核心部件FMM材料产业分析与市场应用