端到端LSTM-RNN：故事分割的高效神经网络方法

190 浏览量更新于2024-08-26 收藏 201KB PDF 举报

本文探讨了一种基于长短时记忆（Long Short-Term Memory, LSTM）循环神经网络（Recurrent Neural Network, RNN）的端到端故事分割方法。传统的故事分割策略通常采用两阶段流程：特征提取和分割。这两个阶段各自独立，特征提取阶段的目标函数与最终故事分割的实际性能指标并不完全一致，这可能导致分割结果的质量下降。为了克服这个问题，作者提出了一种创新的方法，将特征提取和故事边界预测合并成一个统一的LSTM-RNN模型。具体来说，该模型首先利用一个LSTM层来学习和编码文本序列，生成句子向量。这些向量包含了句子级别的语义信息，为后续的故事理解提供了基础。接着，另一个LSTM层负责根据输入的句子向量，通过学习和预测故事的自然结构，来确定每个句子在故事中的边界位置。这一设计使得整个模型能够在一个训练过程中同时优化特征提取和分割任务，从而更好地捕捉到故事段落之间的连贯性和逻辑关系。通过端到端的学习，模型能够直接从原始文本数据中学习到有效的特征表示，减少了手动设计特征工程的工作，并且由于目标函数与最终任务紧密关联，因此可以提高故事分割的准确性。此外，LSTM的长期记忆特性有助于模型理解和记住整个故事的情节发展，进一步增强了模型在复杂文本理解上的能力。这种端到端的故事分割方法不仅简化了传统的处理流程，而且提升了故事分割的效率和精度，对于自然语言处理领域，尤其是在文本分析、叙事理解以及自动故事生成等应用场景中具有重要意义。通过实验验证，该方法显示出优于传统两阶段方法的性能，为未来的文本处理技术提供了新的研究方向和可能的改进策略。

An End-to-End Neural Network Approach to Story

Segmentation

Jia Yu

∗

, Lei Xie

∗‡

, Xiong Xiao

†

, Eng Siong Chng

†

∗

Shaanxi Provincial Key Laboratory of Speech and Image Information Processing,

School of Computer Science, Northwestern Polytechnical University, Xi’an, China

†

School of Computer Engineering, Nanyang Technological University, Singapore

E-mail: {jiayu,lxie}@nwpu-aslp.org, {xiaoxiong,ASESChng}@ntu.edu.sg

Abstract— This paper proposes an end-to-end story segmen-

tation approach based on long short-term memory (LSTM) -

recurrent neural network (RNN). Traditional story segmentation

approaches are a two-stage pipeline consisting of feature extrac-

tion and segmentation, each of which has its individual objective

function. In other words, the objective function used to extract

features is different from the true performance measure of story

segmentation, which may degrade the segmentation results. In

this paper, we combine the two components and optimize them

jointly, using an LSTM-RNN. Speciﬁcally, one LSTM layer is

used to extract sentence vectors, and another LSTM layer is used

to predict story boundaries by taking as input of the sentence

vectors. Importantly, the whole network is optimized directly

to predict story boundaries. We also investigate bi-directional

LSTM (BLSTM) that can utilize past and future information in

the process of extracting sentence vectors and story boundary

prediction. Experimental results on the TDT2 corpus show that

the proposed approach achieves state-of-the-art performance in

story segmentation.

I. INTRODUCTION

Story segmentation is a task of partitioning a stream of

audio, video or text into story segments, each addressing a

speciﬁc topic. It is a necessary precursor for a variety of

language processing technologies including content indexing

and retrieval [1], document summarization [2], topic detection

and tracking [3], [4] and information extraction [5]. Typical

story segmentation approaches are a pipeline consisting of

feature learning and segmentation. The two components are

not optimized jointly for story segmentation, making indepen-

dent assumptions for individual components [6], [7], [8], [9].

Recently, end-to-end (E2E) neural network (NN) learning that

jointly optimizes all components (e.g., in speech recognition)

has achieved promising results [10], [11], [12]. This motivates

us to develop an end-to-end NN approach for the story

segmentation task at hand.

Story segmentation has been studied for different genres,

such as broadcast news [13], [14], meeting recordings [15] and

lectures [16], [17], etc., over various types of media, including

audio [17], [18], [19], video [20] and text [21], [22], [23], [24],

[6], [15]. In this paper, we aim to perform story segmentation

for textual documents like broadcast news speech recognition

transcripts. Note that, with the recent tremendous success

of large vocabulary continuous speech recognition (LVCSR)

‡Corresponding author

using deep neural networks (DNN) [25], [26], [27], [28], [29],

[30], [31], we can easily obtain high accuracy transcripts for

broadcast news. Thus traditional text segmentation approaches,

with similar purposes of story segmentation, can be easily

applied to the speech recognition transcripts.

Traditional story segmentation approaches on texts are a

pipeline system consisting of feature learning that catches

semantic or topic information from a stream of text, and

segmentation that partitions the stream to topically coherent

segments by detecting the topic shift.

Feature extraction heavily affects the performance of story

segmentation. Bag-of-words (BOW) representation, or term

frequency-inverse document frequency (tf-idf), is a simple

representation in typical story segmentation approaches, e.g.,

TextTiling and dynamic programming (DP) [6], [7], [8]. How-

ever, BOW or tf-idf only counts the appearances of words,

ignoring semantic relations among them. Instead, probabilistic

latent semantic analysis (pLSA) [9], latent Dirichlet allocation

(LDA) [32], and LapPLSA [33], employ latent topic variables

and create topic model that depicts the probability distribution

of words on topics. With these probabilistic models, BOW

based word representations are transformed into topic repre-

sentations and used in varies segmentation approaches [32],

[34]. Recently, neural network based topic models have shown

promising performances [35], [36], [37], [38], [39]. Speciﬁ-

cally, we derived word representation in topic space from a

neural network based topic model, leading to improved story

segmentation performance [40].

The second component of the pipeline is a segmenter. The

above-mentioned TextTiling [6], [7] and dynamic program-

ming (DP) [33], [41], [42], [43] are typical detection-based

approaches, which ﬁnd optimal partitions over word sequence

by optimizing a local or global objective. Popular probabilistic

model approaches locate story boundaries by probability dis-

tribution of topics on document and probability distribution of

words on topics. Popular such approaches include PLSA [34],

BayesSeg [44], dd-CRP [45] and HMM [23], [24], [21].

The two components of a story segmentation system are

traditionally modeled independently. The objective function

used to extract feature may be substantially different from

the true performance measure of story segmentation. This

sort of inconsistency may degrade the performance of story

segmentation. The purpose of end-to-end (E2E) learning is to

Proceedings of APSIPA Annual Summit and Conference 2017

12 - 15 December 2017, Malaysia

978-1-5386-1542-3@2017 APSIPA

APSIPA ASC 2017

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38535428

粉丝: 2
资源: 933

端到端LSTM-RNN：故事分割的高效神经网络方法

基于卷积神经网络的MRI脑瘤图像分割方法研究.pdf

cdfmatlab代码-SubForkNet:SubForkNet是一个多标签分割的端到端卷积神经网络，用于深度大脑结构的分割

视网膜动静脉分割的高速端到端方法

shiyan.rar_形态学_车牌分割_车牌检测方法_车牌识别 MATLAB 神经网络_车牌识别 神经网络

基于全卷积神经网络的肺纤维化合并肺肿瘤CT图像的分割方法.pdf

端到端神经网络：成对混淆解决细粒度视觉分类的类间相似性

BP神经网络图像分割：源代码及应用

端到端深度卷积神经网络验证码识别技术

MaskTextSpotter：端到端场景文本识别神经网络

端到端可训练神经网络在图像序列识别中的应用

最新资源

shiyan.rar_形态学_车牌分割_车牌检测方法_车牌识别 MATLAB 神经网络_车牌识别神经网络