探索文本生成的混合卷积变分自编码器

下载需积分: 9 | PDF格式 | 473KB | 更新于2024-09-05 | 86 浏览量 | 举报

本文档《AHybrid Convolutional Variational Autoencoder for Text Generation》发表于2017年的Empirical Methods in Natural Language Processing (EMNLP)会议上，地点在丹麦哥本哈根，日期为9月7日至11日。该研究由斯坦尼斯劳·塞梅尼乌塔（Stanislau Semeniuta）、阿里亚克西·谢尔文（Aliaksei Severyn）和埃哈特·巴思（Erhardt Barth）共同完成，分别来自吕贝克大学神经与生物信息学研究所和谷歌欧洲研究中心。论文探讨了在文本生成任务中，基于深度学习架构的变分自编码器（Variational Autoencoder, VAE）的设计选择对其性能的影响。传统的VAE模型应用于文本时，编码器和解码器都采用循环神经网络（Recurrent Neural Networks, RNN），但作者提出了一种创新的混合架构，结合了全连接的卷积层和反卷积层，以及一个递归的自然语言模型。这种新的混合架构具有几个显著的优点。首先，由于减少了复杂的循环结构，它在运行时间和收敛速度上表现更佳，适用于处理较长的序列，这对于文本生成这类任务来说是非常重要的。其次，与传统VAE易陷入确定性模式的问题不同，这种设计有助于避免模型过度简化，保留了生成的随机性和多样性，从而能够生成更加丰富和多样化的文本。在介绍部分，作者指出了生成模型（Generative models）在自然语言处理中的重要性，特别是在生成式对话系统、文本摘要和文本增强等应用场景中的潜力。VAE作为一种强大的生成模型，通过学习数据的潜在分布，可以生成新的、符合训练数据风格的新文本。然而，如何优化VAE的架构以克服其固有的挑战，如样本质量下降和模式重复，是本文研究的核心问题。通过实验和分析，论文详细比较了新提出的混合卷积VAE（Hybrid Convolutional VAE）与标准RNN VAE在文本生成任务上的效果，展示了前者在保持生成质量的同时，提高了效率和灵活性。研究结果不仅对文本生成领域的技术发展有所贡献，也为其他深度学习模型，特别是那些需要处理长序列和维持多样性输出的模型提供了有价值的参考。总结起来，这篇文章的关键知识点包括： 1. 混合卷积变分自编码器（Hybrid Convolutional Variational Autoencoder）的设计与实现。 2. 基于深度学习的文本生成模型中的架构选择及其对性能的影响。 3. 新架构的优势，如更快的计算速度、更好的序列处理能力和避免VAE模式崩溃的问题。 4. 对传统RNN VAE与混合卷积VAE在文本生成任务上的对比实验和分析。深入理解并应用这些概念对于提升自然语言处理任务中的文本生成能力具有重要意义。

展开

Figure 1: LSTM VAE model of (Bowman et al.,

2016)

forward part is composed of a fully convolutional

encoder and a decoder that combines deconvolu-

tional layers and a conventional RNN. Finally, we

discuss optimization recipes that help VAE to re-

spect latent variables, which is critical training a

model with a meaningful latent space and being

able to sample realistic sentences.

3.1 Variational Autoencoder

The VAE is a recently introduced latent vari-

able generative model, which combines varia-

tional inference with deep learning. It modiﬁes the

conventional autoencoder framework in two key

ways. Firstly, a deterministic internal representa-

tion z (provided by the encoder) of an input x is re-

placed with a posterior distribution q(z|x). Inputs

are then reconstructed by sampling z from this

posterior and passing them through a decoder. To

make sampling easy, the posterior distribution is

usually parametrized by a Gaussian with its mean

and variance predicted by the encoder. Secondly,

to ensure that we can sample from any point of

the latent space and still generate valid and diverse

outputs, the posterior q(z|x) is regularized with

its KL divergence from a prior distribution p(z).

The prior is typically chosen to be also a Gaussian

with zero mean and unit variance, such that the KL

term between posterior and prior can be computed

in closed form (Kingma and Welling, 2013). The

total VAE cost is composed of the reconstruction

term, i.e., negative log-likelihood of the data, and

the KL regularizer:

vae

= KL(q(z|x)||p(z))

−E

q(z|x)

[log p(x|z)]

(1)

Kingma and Welling (2013) show that the loss

function from Eq (1) can be derived from the

probabilistic model perspective and it is an upper

bound on the true negative likelihood of the data.

One can view a VAE as a traditional Autoen-

coder with some restrictions imposed on the in-

ternal representation space. Speciﬁcally, using a

sample from the q(z|x) to reconstruct the input

instead of a deterministic z, forces the model to

map an input to a region of the space rather than

to a single point. The most straight-forward way to

achieve a good reconstruction error in this case is

to predict a very sharp probability distribution ef-

fectively corresponding to a single point in the la-

tent space (Raiko et al., 2014). The additional KL

term in Eq (1) prevents this behavior and forces the

model to ﬁnd a solution with, on one hand, low re-

construction error and, on the other, predicted pos-

terior distributions close to the prior. Thus, the de-

coder part of the VAE is capable of reconstructing

a sensible data sample from every point in the la-

tent space that has non-zero probability under the

prior. This allows for straightforward generation

of novel samples and linear operations on the la-

tent codes. Bowman et al. (2016) demonstrate

that this does not work in the fully deterministic

Autoencoder framework . In addition to regulariz-

ing the latent space, KL term indicates how much

information the VAE stores in the latent vector.

Bowman et al. (2016) propose a VAE model for

text generation where both encoder and decoder

are LSTM networks (Figure 1). We will refer to

this model as LSTM VAE in the remainder of the

paper. The authors show that adapting VAEs to

text generation is more challenging as the decoder

tends to ignore the latent vector (KL term is close

to zero) and falls back to a language model. Two

training tricks are required to mitigate this issue:

(i) KL-term annealing where its weight in Eq (1)

gradually increases from 0 to 1 during the training;

and (ii) applying dropout to the inputs of the de-

coder to limit its expressiveness and thereby forc-

ing the model to rely more on the latent variables.

We will discuss these tricks in more detail in Sec-

tion 3.4. Next we describe a deconvolutional layer,

which is the core element of the decoder in our

VAE model.

3.2 Deconvolutional Networks

A deconvolutional layer (also referred to as trans-

posed convolutions (Gulrajani et al., 2016) and

fractionally strided convolutions (Radford et al.,

2015)) performs spatial up-sampling of its inputs

and is an integral part of latent variable genera-

tive models of images (Radford et al., 2015; Gulra-

jani et al., 2016) and semantic segmentation algo-

rithms (Noh et al., 2015). Its goal is to perform an

“inverse” convolution operation and increase spa-

tial size of the input while decreasing the number

of feature maps. This operation can be viewed as

629

下载后可阅读完整内容，剩余10页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

MihaelaRosca

粉丝: 0

探索文本生成的混合卷积变分自编码器

人类预期寿命数据（世界各国）1960-2022年.xlsx

光伏超级电容与蓄电池混合储能系统能量管理仿真：模型、算法及应用场景

基于FPGA与Verilog的多波形实时可调DDS信号发生器设计

ABAQUS有限元软件中刀盘切削竹材模型的构建与应用

光电探测器仿真：温度特性仿真.zip

Python计算机课程设计项目：基于改进UNet和GAN的图像修复系统

基于PLC1200与Factory IO的虚拟工厂仿真设计及调试经验分享

Comsol金属贴片建模与多极子展开分析：电磁学领域的透反射计算及应用

郑予彬-生成式AI提升开发者效能.pdf

基于MATLAB的三自由度车辆动力学模型与CKF算法的质心侧偏角估计及联合仿真

最新资源