LeakGAN:改进长文本生成的对抗训练

需积分: 10 115 浏览量更新于2024-09-09 收藏 1.51MB PDF 举报

"LeakGAN文献探讨了生成式对抗网络（GANs）在长文本生成中的挑战，并提出了一种名为LeakGAN的新框架，通过泄漏鉴别器的信息来改善生成器的训练过程，以应对长文本生成中的问题。" 在生成式对抗网络（GANs）的领域，自从其首次出现以来，其在图像生成任务上的卓越表现吸引了大量的研究者投身于改进GANs的稳定性和可控性。然而，将GANs应用于文本生成时遇到了一些困难。主要问题是，生成器G从鉴别器D获得的反馈信息量有限，特别是在生成较长文本时，这不足以有效地指导G进行更新和提升生成的质量。论文"LongTextGenerationviaAdversarialTrainingwithLeakedInformation"由郭嘉贤、吕思迪、蔡瀚、张文宁、余勇以及王俊共同撰写，分别来自上海交通大学和伦敦大学学院。他们提出了一种新的对抗性训练方法，称为LeakGAN，旨在解决长文本生成的问题。传统的GANs在文本生成中，只有在完整文本生成后才能获取大的指导信号，而且缺乏生成过程中的中间信息，这限制了它在生成较长文本（超过20个单词）时的效果。LeakGAN的创新之处在于允许鉴别器泄露其高级别的理解信息，这种泄漏的信息能够在生成过程中提供更多的实时反馈，帮助生成器逐步优化其生成的内容。具体来说，LeakGAN通过在生成过程中引入泄漏信息，使得鉴别器可以在文本生成的每个步骤中向生成器提供部分反馈，而不是仅仅在生成结束后提供整体评价。这样，生成器可以依据这些即时的指导信号，更有效地调整其生成策略，从而提高长文本的连贯性和语义一致性。 LeakGAN为长文本生成提供了一个新的视角，它改进了传统GANs在处理复杂和结构化文本时的局限性，有望在机器翻译、对话系统、图像标题生成等领域实现更加高质量和连贯的文本生成。这一工作对深度学习和自然语言处理领域的研究有着重要的贡献，促进了文本生成技术的发展。

generated samples to improve the generator and could result

in mode collapse problems. Feature Matching (Zhang et al.

2017) provides a mechanism that matches the latent feature

distributions of real and generated sequences via a kernel-

ized discepancy metric to alleviate the weak guidance and

mode collapse problems. However, such enhancement only

happens when the whole text sample is generated and thus

the guiding signal is still sparse during the training.

Reinforcement learning (RL) on the other hand also faces

a similar difﬁculty when reward signals are sparse (Kulkarni

et al. 2016). Hierarchical RL is one of the promising tech-

niques for handling the sparse reward issue (Sutton, Precup,

and Singh 1999). A typical approach in hierarchical RL is

to manually identify the hierarchical structure for the agent

by deﬁning several low-level sub-tasks and learning micro-

policies for each sub-task while learning a macro-policy for

choosing which sub-task to solve. Such methods can be very

effective when the hierarchical structure is known a priori

using domain knowledge in a given speciﬁc task, but fail

to ﬂexibly adapt to other tasks. Recently, (Vezhnevets et al.

2017) proposed an end-to-end framework for hierarchical

RL where the sub-tasks are not identiﬁed manually but im-

plicitly learned by a MANAGER module which takes current

state as input and output a goal embedding vector to guide

the low-level WORKER module.

In this work, we model the text generation procedure via

adversarial training and policy gradient (Yu et al. 2017). To

address the sparse reward issue in long text generation, we

follow (Vezhnevets et al. 2017) and propose a hierarchy de-

sign, i.e. MANAGER and WORKER, for the generator. As the

reward function in our case is a discriminative model rather

than a black box in (Vezhnevets et al. 2017), the high-level

feature extracted by the discriminator given the current gen-

erated word sequence is sent to the MANAGER module. As

such, the MANAGER module can be also viewed as a spy that

leaks information from the discriminator to better guide the

generator. To our knowledge, this is the ﬁrst work that con-

siders the information leaking in GAN framework for better

training generators and combines hierarchical RL to address

long text generation problems.

Methodology

We formalize the text generation problem as a sequen-

tial decision making process (Bachman and Precup 2015).

Speciﬁcally, at each timestep t, the agent takes the previ-

ously generated words as its current state, denoted as s

, . . . , x

), where x

represents a word token in

the given vocabulary V . A θ-parameterized generative net

, which corresponds to a stochastic policy, maps s

to a

distribution over the whole vocabulary, i.e. G

(·|s

), from

which the action x

t+1

, i.e. the next word to select is sam-

pled. We also train a φ-parameterized discriminative model

that provides a scalar guiding signal D

) for G

adjust its parameters when the whole sentence s

has been

generated.

As we discussed previously, although the above adversar-

ial training is principled, the scalar guiding signal becomes

relatively less informative when the sentence length T goes

larger. To address this, the proposed LeakGAN framework

allows discriminator D

to provide additional information,

denoted as features f

, of the current sentence s

(it is in-

ternally used for D

itself for discrimination) to genera-

tor G

(·|s

). In LeakGAN, a hierarchical RL architecture

is used as a promising mechanism to effectively incorporate

such leaked information f

into the generation procedure of

(also see Figure 1).

Leaked Features from D as Guiding Signals

Different from typical model-free RL settings where the re-

ward function is a black box, our adversarial text generation

uses D

as a learned reward function. Typically, D

is a neu-

ral network and can be decomposed into a feature extractor

F(· ; φ

) and a ﬁnal sigmoid classiﬁcation layer with weight

vector φ

. Mathematically, given input s, we have

(s) = sigmoid(φ

F(s; φ

)) = sigmoid(φ

f), (1)

where φ = (φ

, φ

) and sigmoid(z) = 1/(1 + e

−z

f = F(s; φ

) is the feature vector of s in the last layer

of D

, which is to be leaked to generator G

. As is shown

in Eq. (1), for a given D

, the reward value for each state

s mainly depends on the extracted features f. As such, the

objective of getting a higher reward from D

is equivalent

to ﬁnding a higher reward region in this extracted feature

space F(S; φ

) = {F(s; φ

)}

s∈S

. Speciﬁcally, our feature

extractor F(· ; φ

) in D

is implemented by a CNN (Zhang

and LeCun 2015); thus F(s; φ

) outputs the CNN fea-

ture map vector as f after its convolution-pooling-activation

layer. Other neural network models such as LSTM (Hochre-

iter and Schmidhuber 1997) can also be used to implement

Compared to the scalar signal D

(s), the feature vector f

is a much more informative guiding signal for G

, since it

tells what the position of currently-generated words is in the

extracted feature space.

A Hierarchical Structure of G

In each step t during the generation procedure, to utilize the

leaked information f

from D

, we follow hierarchical RL

(Vezhnevets et al. 2017) to have a hierarchical architecture

of G

. Speciﬁcally, we introduce a MANAGER module, an

LSTM that takes the extracted feature vector f

as its input

at each step t and outputs a goal vector g

, which is then

fed into the WORKER module to guide the generation of the

next word in order to approach the higher reward region in

F(S; φ

). Next we will ﬁrst describe the detailed generator

model in LeakGAN and then show how the MANAGER and

WORKER are trained with the guiding signals from D

Generation Process. The MANAGER and WORKER mod-

ules both start from an all-zero hidden state, denoted as

and h

respectively. At each step, the MANAGER re-

ceives the leaked feature vector f

from the discriminator

, which is further combined with current hidden state of

the MANAGER to produce the goal vector g

ˆg

, h

= M(f

, h

t−1

; θ

), (2)

= ˆg

/kˆg

k, (3)

剩余13页未读，继续阅读

linyuefeng123

粉丝: 0
资源: 2

LeakGAN:改进长文本生成的对抗训练

Python-LeakGAN使用GAN和分层强化学习的文本生成

Python实现LeakGAN

做极小数据集,例如只有几百几千条数据的文本生成式模型有哪些及他们的效果排名,如果是LeakGAN模型可以胜任这样的小数据集吗?

weixin151云匹面粉直供微信小程序+springboot后端毕业源码案例设计.zip

1_基于CEEMDAN-EDO的行波波头标定算法研究_李英春.pdf

驾校收支管理可视化平台 SSM毕业设计 附带论文.zip

基于中心环绕Retinex算法的6MV X射线EPID图像增强

2011-2022年中国光伏电站遥感识别面矢量数据-最新出炉.zip

2023年OpenAI多模态升级推动AIGC产业发展报告

大学生心理健康服务+ssm

最新资源

驾校收支管理可视化平台 SSM毕业设计附带论文.zip