对话响应生成：基于定制PCFG解析器的方法

194 浏览量更新于2024-08-26 收藏 243KB PDF 举报

"使用量身定制的PCFG解析器在对话中生成响应" 这篇研究论文主要探讨了一种针对自然语言生成任务的解析方法，该方法利用概率上下文无关文法（Probabilistic Context-Free Grammar，简称PCFG）来定制生成对话响应。作者Caixia Yuan、Xiaojie Wang和Qianhui He来自北京邮电大学计算机科学学院，他们在2015年欧洲自然语言生成研讨会（ENLG）上发表了这篇论文。在自然语言处理领域，生成对话响应是一项复杂任务，它要求系统能够理解输入的语义表示（Meaning Representation，简称MR）并生成相应的自然语言表达。传统的做法通常需要对MR的句法结构有先验知识，但本研究提出的方法无需这样的前提。该方法学习一个定制的PCFG，用于编码MR及其对应的自然语言表达，并通过解析出最优的语法树来生成目标MR的自然语言句子。 PCFG是一种形式语法模型，它以概率形式描述了语言中的句子结构。在本论文中，PCFG被用来自适应地学习对话系统中MR和NL表达之间的关系。这种方法的优势在于其灵活性，因为它可以根据不同的MR自动构建合适的句法结构，无需预先定义MR的句法规则。实验部分，研究人员将此方法应用于中文语音对话系统的响应生成。结果显示，他们的方法在BLEU分数和人工评估上与强基线方法的表现相当。BLEU分数是一种常用的机器翻译评估指标，而人工评估则反映了生成响应的自然度和相关性。这些结果表明，定制的PCFG解析器能够在不依赖预定义句法的情况下有效地生成对话响应，且效果接近于那些基于先验句法知识的方法。这篇论文提出了一种创新的自然语言生成策略，即利用适应性的PCFG来生成对话响应，为对话系统的设计提供了新的思路。这种方法对于处理不同语言和领域的对话生成问题具有潜在的应用价值，尤其是在缺乏特定句法知识的情况下。未来的研究可能会进一步探索如何优化这种解析策略，提高生成响应的质量和多样性。

Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), pages 81–85,

Brighton, September 2015.

2015 Association for Computational Linguistics

Response Generation in Dialogue using a Tailored PCFG Parser

Caixia Yuan Xiaojie Wang Qianhui He

School of Computer Science

Beijing University of Posts and Telecommunications

{yuancx, xjwang}@bupt.edu.cn

alisonchinabupt@gmail.com

Abstract

This paper presents a parsing paradigm for

natural language generation task, which

learns a tailored probabilistic context-free

grammar for encoding meaning represen-

tation (MR) and its corresponding natural

language (NL) expression, then decodes

and yields natural language sentences at

the leaves of the optimal parsing tree for

a target meaning representation. The ma-

jor advantage of our method is that it does

not require any prior knowledge of the M-

R syntax for training. We deployed our

method in response generation for a Chi-

nese spoken dialogue system, obtaining

results comparable to a strong baseline

both in terms of BLEU scores and human

evaluation.

1 Introduction

Grammar based natural language generation (NL-

G) have received considerable attention over the

past decade. Prior work has mainly focused on

hand-crafted generation grammar (Reiter et al.,

2005; Belz, 2008), which is extensive, but also ex-

pensive. Recent work automatically learns a prob-

abilistic regular grammar describing Markov de-

pendency among ﬁelds and word strings (Konstas

and Lapata, 2012a, Konstas and Lapata, 2013),

or extracts a tree adjoining grammar provided an

alignment lexicon is available which projects the

input semantic variables up the syntactic tree of

their natural language expression (Gyawali and

Gardent, 2014). Although it is a consensus that at

a rather abstract level natural language generation

can beneﬁt a lot from its counterpart natural lan-

guage understanding (NLU), the problem of lever-

aging NLU resources for NLG still leaves much

room for investigation.

In this paper, we propose a purely data-driven

natural language generation model which exploits

a probabilistic context-free grammar (PCFG) pars-

er to assist natural language generation. The ba-

sic idea underlying our method is that the generat-

ed sentence is licensed by a context-free-grammar,

and thus can be deduced from a parsing tree which

encodes hidden structural associations between

meaning representation and its sentence expres-

sion. A tailored PCFG, i.e., a PCFG easily tailored

to application-speciﬁc concepts, is learned from

pairs of structured meaning representation and its

natural language sentence and then used to guide

generation processes for other previously unseen

meaning representations. Table 1 exempliﬁes a

record from the application under consideration.

Our model is closest to (Konstas and Lapata,

2012a) and (Konstas and Lapata, 2013) who re-

formulate the Markov structure between a mean-

ing representation and a string of text depicted

in (Liang, et al., 2009) into a set of CFG rewrite

rules, and then deduce the best derivation tree for

a database record. Although this Markov structure

can capture a few elements of rudimentary syntax,

it is essentially not linguistic grammars. Thus the

sentences produced by this model are usually un-

grammatically informed (for instance, its 1-BEST

model produces grammatically illegal sentences

like “Milwaukee Phoenix on Saturday on Satur-

day on Saturday on Saturday”). (Konstas and La-

pata, 2013) claims that long range dependency is

an efﬁcient complementary to CFG grammar, and

incorporates syntactic dependency between words

into the reranking procedure to enhance the perfor-

mance. Although conceptually similar, our mod-

el directly learns more grammatical rewrite rules

from hybrid syntactic trees whose nonterminal n-

odes are comprised of phrasal nodes inheriting

from a common syntactic parser, and conceptual

nodes designed for encoding target meaning rep-

resentation. Therefore, the learning aspect of two

models is fundamentally different. We have a sin-

gle CFG grammar that applies throughout, where-

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38639471

粉丝: 8
资源: 931

对话响应生成：基于定制PCFG解析器的方法

PCFG:PCFG 训练器和解析器

pcfg句法分析

使用定位的PCFG生成随机语言

GENPass：使用PCFG规则和对抗性生成进行密码猜测的通用深度学习模型

PCFG parser

pcfg_js:PCFG TypeScript 实现

TreeProcessor：转换器用于括号内的注释语法树，生成PCFG，优势关系，作用域，C命令

berkeley-parse-APIs:该项目从 berkeley 解析器 jar 中提取解析 API，以在使用解析器时提供灵活性

pcfg_earley:pt-BR语言Context-Free Grammar识别和解析的Earley算法实现

句法分析与消解PCFG改进

最新资源