自然语言生成的现状与展望：核心技术与应用评估

自然语言生成

需积分: 10 78 浏览量更新于2024-07-18 收藏 5.79MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation" 自然语言生成（Natural Language Generation, NLG）是人工智能领域的一个关键分支，它涉及从非语言输入（如数据、知识图谱等）生成文本或语音。近年来，随着大数据驱动的方法和技术的快速发展，NLG在各种应用中展现出了巨大的潜力。本文的作者Albert Gatt和Emiel Krahmer对当前NLG领域的状态进行了全面的调查和综述。他们关注的核心任务包括但不限于以下几个方面： 1. **内容规划**：这是NLG过程的第一步，涉及到确定要传达的信息和结构。内容规划可能基于数据库、知识库或其他形式的非语言输入，需要选择并排序关键信息，以创建连贯的叙述。 2. **文本微观规划**：这一阶段涉及将内容规划阶段产生的抽象信息转化为具体的词汇和短语。这通常涉及到词汇选择、句法结构的决定，以及如何有效地表达信息。 3. **文本宏观规划**：宏观规划关注文本的更高层次结构，例如段落组织和篇章连贯性。这一阶段确保生成的文本在整体上是连贯且有意义的。 4. **生成执行**：最后，生成执行将微观和宏观规划的结果转化为实际的文本输出，这可能涉及到语言模型、语法转换和词汇选择等技术。除了这些核心任务，文章还强调了NLG与其他AI领域间的交叉和协同作用，如机器学习、深度学习和统计建模的引入，这些方法极大地推动了NLG技术的进步。例如，神经网络模型如Transformer和BERT已经在文本生成任务中展现出强大的性能，能生成更加自然和流畅的文本。此外，NLG的应用场景也日益广泛，涵盖了自动报告生成、聊天机器人、智能助手、个性化新闻生成、多语言翻译、虚拟现实对话系统等多个领域。这些应用不仅提高了效率，也在改善人机交互体验和信息传播方式方面发挥了重要作用。评估NLG系统的质量是一项挑战，因为它涉及到人类的主观判断。传统上，人工评估是主要手段，但近年来，自动评估指标如BLEU、ROUGE和METEOR也被广泛应用，尽管它们并不能完全替代人类评估，但可以提供一定的量化参考。 NLG领域的最新进展表明，这一技术正在快速演进，与数据科学、计算语言学和认知科学的融合不断加深。随着技术的发展，我们期待看到更多创新的NLG应用出现，为日常生活和各行各业带来便利。

资源详情

资源推荐

vation is made by Mellish & Dale, 1998, p.351). A fundamental contribution

in this context is by E l had ad et al. (1997), who describe a uniﬁcation-based

approach, unifying conceptual representations (i.e., preverbal messages) with

grammar rules encoding lexical as well as syntactic choices.

2.5 Referring expression generation

Referring Expression Generation (reg) is characterised by Reiter and Dale

(1997, p.11) as “the task of selec t i ng words or phrases to identify domain en-

tities”. This characterisation suggests a close similarity to lexicalisation, but

Reiter and Dale (2000) point out that the essential di↵erence is that referring

expression generation is a “discrimination task, where the system needs to com-

municate suﬃcient information to distinguish one domain entity from other

domain entities”. reg is among the tasks within the ﬁeld of automated text

generation that has received most attention in recent years (Mellish et al., 2006;

Siddharthan et al., 2011) . Since it can be separated relatively easily from a

speciﬁc application domain and studied in its own right, various ‘standalone’

solutions for the reg problem exist.

In our running example, the three bradycardia events shown in Figure 1b

are later represented as a set of thre e entities under the theme argument of be,

following lex i cal i sat i on (Figure 1c). How the system refers to them will depend,

among ot he r things, on whether they’ve already been mentioned (in which case,

a pronoun or de ﬁn i t e description might work) and if so, whether they need to

be distinguished from any other sim il ar entities (in which case, they might need

to be distinguished by some properties, such as the time when they occurred).

The ﬁrst choice is therefore related to referential form:whetherentities

are referred to using a pronoun, a proper name or an (in)deﬁnite description,

for example. This depends partly on the extent to which the entity is ‘in fo-

cus’ or ‘salient’ (see e.g., Poesio et al., 2004) and in de ed such notions under l ie

many computational accounts of pr onoun generation (e.g., McCoy & S tr u be,

1999; Callaway & Lester, 2002; Kibble & Power, 2004). Choosi n g referential

forms has recently been the topic of a series of share d tasks on the Genera-

tion of Referring Expressions i n Context (grec; Belz et al., 2010), using data

from Wikipedia articles, which included choices such as reﬂexive pronouns and

proper names. Many systems participating in this challenge framed the prob-

lem in terms of classiﬁcation among these many opti on s. Still, it is probably

fair to say that much work on referential form has focussed on when to use

pronouns. Forms such as proper names remain understudied, although recently

various researchers have highlighted the problems of proper name generation

(Siddharthan et al., 2011; van Deemter, 2016; Castro Ferreira et al., 2017).

Determining the referential content usually comes into play w he n the chosen

form is a descrip ti on . Typically, there are multiple entities wh ich have the same

referential category or type in a domain (more than one player, for example, or

several bradycardias). As a result, other properties of the entity will need to be

mentioned if it is to be identiﬁed by the reader or hearer . Earlier reg research

often worked with simple visual domains, such as Figure 2a or its corresponding

(a) Visual domain from the gre3d cor-

pus (Viethen & Dale, 2008)

Domain object s

Attr d

Color blue green blue

Shape ball cube ball

Size small large large

Rel bef(d

)beh(d

)nt(d

)

(b) Table of objects and attributes.

beh:‘behind’;bef: ‘before’; nt:‘next

to’

Figure 2: Visual domain and corresponding tabular representation

tabular representation, taken from the gre3d corpus (Viethen & Dale, 2008).

In this example, the reg content selection problem is to ﬁnd a set of properties

for a target (say d

) that singles it out from its two distractors (d

and d

reg content determination algorithms can be thought of as performing a

search through the known properties of the referent for the ‘right’ combination

that will dist i ngu i sh it in context. What constitutes the ‘right’ combination

depends on t he underlying theory. Too much information in the description (as

in the small blue ball before the large green cup) might be misleading or even

boring; too little (the ball) might hinder identiﬁcation. Much work on reg has

appealed to the Gricean maxim stating that speakers should make sure that

their contributions are suﬃciently informati ve for the purposes of the exchange,

but not more so (Grice, 1975). How this is interpreted has been the subject of

a number of algorithmi c interpretations, including:

• Conducting an exhaust i ve search through the space of possible descriptions

and choosing the smalles t set of properties that will identify the target

referent, the strategy incorporated by the Full Brevity procedure (Dale,

1989). In our example domain, this would select size.

• Selecting properties incrementally, but choosing the one which rules out

most distractors at each step, thereby minimising the possibility of in-

cluding information that is n’ t direct l y relevant to the identiﬁcation task.

This is the underlying idea of the Greedy Heuri st i c algorithm (Dale, 1989,

1992), and it has more recently been revived in stochastic uti l i ty-based

models such as Frank e t al. (2009). In our example s cen e, such an algo-

rithm would once again consider size ﬁrs t .

• Selecting properties incrementally, but based on domain-speciﬁc prefer-

ence or cognitive salience. This is the strategy incor porated in the Incre-

mental Algorithm (Dale & Reiter, 1995), which would predict that color

should be preferred over size in our example.

While these heuristics focus exclusively on the requirement that a referent

be unambiguously id entiﬁed, research on reference in dialogue (e.g., Jordan

& Walker, 2005) has shown that under certain conditions, referring expressions

may also include ‘redundant’ properties in order to achieve other communicative

goals, such as conﬁr mat i on of a prev i ou s utterance by an interlocutor. Similarl y,

White et al. (2010) present a system which generates user-tailored descriptions

in spoken dialogue, arguing that, for example, a fr eq ue nt ﬂyer would prefer

di↵erent descri p ti on s of ﬂights than a student who only ﬂi es occasionally.

These various algorithms compute (possibly di↵erent) distin gu is hi n g descrip-

tions for target referents (more precisely: they select sets of properties that dis-

tinguish the target, but that still need to be expressed in words; see Section 2.6

below). Various strands of more recent work can be distinguish ed (surveyed in

Krahmer & van De emter, 2012). Some researchers have focussed on extending

the expressivity of the ‘classical’ algorithms, to include plurals (the two balls)

and relations (the ball in front of a cube) (e.g., Horacek, 1997; Stone, 2000;

Gardent, 2002; Kelleher & Kruij↵, 2006; Viethen & Dale , 2008, among many

others). Other work has cast the problem in probabilist i c terms; for example,

FitzGerald et al. (2013) frame reg as a problem of estimati ng a log-linear distr i-

bution over a space of logical forms representing expr es si on s for sets of objects.

Other work has concentrated on evaluating the performance of di↵erent reg

algorithms, by collecting controlled human references and comparing these with

the references predicted by various algorithms (e.g., Belz, 2008; Gatt & Belz,

2010; Jordan & Walker, 2005, again among many others). In a similar vein,

researchers have also started e x pl or in g the relevance of reg algorithms as psy-

cholinguisti c models of human language production (e.g., van Deemter et al.,

2012b).

Adi↵erent line of work has moved away from the separation between content

selection and form, performing these tasks jointly. For example, Engonopou-

los and Koller (2014) use a synchronous grammar that directly relates surface

strings to target referents, using a chart to compute the possible expressions

for a given target. This work bears some relationship to planning-based ap-

proaches we discuss in Section 3.2 below, which exploit grammatical formalisms

as planning operators (e.g. Ston e & Webber, 1998; Koller & Stone, 2007), solv-

ing realisation and content determination problems in tandem (including reg

as a special case).

Finally, in earlier work visual information was typically ‘simpliﬁed’ into a

table (as we did above), but there has been substantial progress on reg in more

complex scenarios. For example, the give challenge (Koller et al., 2010), pro-

vided impetus for the exploration of situated referen ce to objects in a virtual

environment (see also Stoia & Shockley, 2006; Garouﬁ & Koller, 2013). More

recent work has starte d exploring the interface between computer vision and

reg to produce descriptions of objects in complex, realistic visual scenes, in-

cluding photograp hs (e.g., Mitchell et al., 2013; Kazemzadeh et al., 2014; Mao

et al., 2016). This forms part of a broader set of developments focus si ng on t h e

relatonship between vision and language, which we turn to in Section 4.

2.6 Linguistic realisat io n

Finally, when all the relevant words and phrases have been decided upon, these

need to be combined to form a well-formed sentence. The simp l e example in

Figure 1d shows the structure underlying the sentence there were three successive

bradycardias down to 69 , the linguistic message corresponding to the porti on

selected from the original signal in Figure 1a.

Usually referre d to as linguistic realisation, this task involves ordering con-

stituents of a sentence, as well as generating the right m orp h ologi c al forms

(including verb conjugations and agreement, in those languages where this is

relevant). Often, real i ser s also ne ed to ins e rt function words (such as auxiliary

verbs and preposi t i ons ) and punctuation marks. An important complication at

this st age is that the output needs to inclu de various linguistic components that

may not be pr es ent in the input (an instance of the ‘generation gap’ discussed

in Section 3.1 below); thus, this generation task can be thought of in terms

of projection b etween non-isomorphic stru ct u re s (cf. Ballesteros et al., 2015).

Many di↵erent approaches have been proposed, of which we will discuss

1. human-crafted t em pl at e s;

2. human-crafted gram m ar- bas ed s ys t ems ;

3. statistical approaches.

2.6.1 Templates

When application domains are small and variation is expected to be minimal,

realisation is a relatively easy task, and outputs can be speciﬁed using templates

(e.g., Reiter et al., 1995; McRoy et al ., 2003) , s uch as the following.

(7) $player scored for $team in the $minute minute.

This template has three variables, whi ch can be ﬁlled with the names of a player,

a team, and the minute in which this player scored a goal. It can thus serve to

generate sentences like:

(8) Ivan Rakitic scored for Barcelona in the 4th minute.

An advantage of templates is that they allow for full control over the quality

of the output and avoid the gene r ati on of ungramm at ic al structures. Mod -

ern vari ants of the template-based method include syntactic information in the

templates, as well as possibly complex rules for ﬁlling the gaps (Theune et al.,

2001), maki ng it diﬃcult to distinguish templates fr om more sophisticated meth-

ods (van Deemter et al., 2005). The disadvantage of templates i s that they are

labour-intensive if constructed by hand (though temp l at es have recently been

learned automatically from corpus data, see e.g., Angeli et al., 2012; Kondadadi

et al., 2013, and the discussion in Section 3.3 below). They also do not scale

well to applicati on s whi ch require considerable linguistic variation.

2.6.2 Hand-coded grammar-based sys tems

An alternative to templates is provided by general-purpose, domain-independent

realisation systems. Most of these systems are grammar-based, that is, they

make some or all of their choices on the basis of a grammar of the language

under considerat ion . This grammar can be manually written, as in many classic

o↵-the-shelf realisers such as fuf/surge (E lh ad ad & Robin, 1996), mumble

(Meteer et al., 1987), kpml (Bateman, 1997), nigel (Mann & Matthiessen,

1983), and RealPro (Lavoie & Rambow, 1997). Hand-coded grammar-based

realisers tend to require very detailed input. For example, kpml ( B ate man ,

1997) is based on Systemic-Func t ion al Grammar (sfg; Halliday & Matthiessen,

2004), and realisation is modelled as a traversal of a network in which choices

depend on both grammatical and semantico-pragmatic information. This level

of detail makes these systems diﬃcult to use as simple ‘plug-and-play’ or ‘o↵

the shelf’ modules (e.g., Kasp er , 1989), something which has motivated the

development of simple realisation engines which provide syntax and morphology

apis, but leave choice-making up to the developer (Gatt et al., 2009; Vaudry &

Lapalme, 2013; Bollmann, 2011; de Oliveira & Sr i pad a, 2014).

One diﬃculty for grammar-based systems is how to make choices among

related options, such as the following, where hand-crafted rules with the right

sensitivity to context and input are diﬃcult to design:

(9) Ivan Rakitic scored for Barcelona in the 4th minute.

(10) For Barcelona, Ivan Rakitic scored in minute four.

(11) Barcelona player Ivan Rakitic scored after four minutes.

2.6.3 Statistical approaches

Recent ap p roaches have sought to acquire probabilistic grammars from large

corpora, cutting down on the amount of manual labour required, while increas-

ing coverage. Essentially, two approaches h ave been taken to include statistical

information in the realisati on proces s. One approach, introduced by th e semi-

nal work of Langkilde and Knight (Langkilde-Geary, 2000; Langkilde-Geary &

Knight, 2002) on the halogen/nitrogen s ys t ems , r el i es on a two-level ap-

proach, in which a small, hand-crafted gramm ar is used to gener at e alternative

realisations represented as a forest, from which a stochastic re-ranker selects

the optimal candidate. Langkilde and Knight rely on corpus-based statistical

knowledge in the form of n-gr ams , whereas others have experimented with mor e

sophisticated statistical models to perform reranking (e.g., Bangalore & Ram-

bow, 2000; Ratnaparkhi, 2000; Cahil l e t al. , 2007). The second approach does

not rely on a computationally expensive generate-and-ﬁlter approach, but uses

剩余117页未读，继续阅读

GreyWarden

粉丝: 0
资源: 1

自然语言生成的现状与展望：核心技术与应用评估

Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

A Survey of the Recent Architectures of Deep Convolutional Neural Networks.pdf

Help me write a summary report of the results of the 50000 on the results of the rural emergency management system provider⟫

write a thesis about scientific activity survey in middle school

Transformer-Based Visual Segmentation: A Survey

请给我一篇关于地名消歧的最新外文综述文献

引用文献介绍基于内容的推荐算法

survey of wireless in door

大数据与云计算融合技术相关文献

A survey of transformers

Capsule Networks for Computer Vision: A Survey翻译

对这篇文献的评价Applications of the Internet of Things (IoT) in Smart Logistics: A Comprehensive Survey

给我一些国外医疗研究现状和趋势的文献

知识图谱英语参考文献

Graduate thesis in financial management

a survey of llm

列出遥感图像目标检测相关文献

a survey of transformers

江苏科技大学计算机科学与技术专业，考研复试科目历年真题

最新资源