
Adversarial Feature Matching for Text Generation
Yizhe Zhang
1
Zhe Gan
1
Kai Fan
1
Zhi Chen
1
Ricardo Henao
1
Dinghan Shen
1
Lawrence Carin
1
Abstract
The Generative Adversarial Network (GAN) has
achieved great success in generating realistic (real-
valued) synthetic data. However, convergence
issues and difficulties dealing with discrete data
hinder the applicability of GAN to text. We pro-
pose a framework for generating realistic text via
adversarial training. We employ a long short-
term memory network as generator, and a con-
volutional network as discriminator. Instead of
using the standard objective of GAN, we propose
matching the high-dimensional latent feature dis-
tributions of real and synthetic sentences, via a
kernelized discrepancy metric. This eases adver-
sarial training by alleviating the mode-collapsing
problem. Our experiments show superior perfor-
mance in quantitative evaluation, and demonstrate
that our model can generate realistic-looking sen-
tences.
1. Introduction
Generating meaningful and coherent sentences is central to
many natural language processing applications. The gen-
eral idea is to estimate a distribution over sentences from
a corpus, then use it to sample realistic-looking sentences.
This task is important because it enables generation of novel
sentences that preserve the semantic and syntactic properties
of real-world sentences, while being potentially different
from any of the examples used to estimate the model. For
instance, in the context of dialog generation, it is desirable
to generate answers that are more diverse and less generic
(Li et al., 2016).
One simple approach consists of first learning a latent
space to represent (fixed-length) sentences using an encoder-
decoder (autoencoder) framework based on Recurrent Neu-
ral Networks (RNNs) (Cho et al., 2014; Sutskever et al.,
2014), then generate synthetic sentences by decoding ran-
1
Duke University, Durham, NC, 27708. Correspondence to:
Yizhe Zhang <yizhe.zhang@duke.edu>.
Proceedings of the
34
th
International Conference on Machine
Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by
the author(s).
dom samples from this latent space. However, this approach
often fails to generate realistic sentences from arbitrary
latent representations. The reason for this is that, when map-
ping sentences to their latent representations using an au-
toencoder, the mappings usually cover a small but structured
region of the latent space, which corresponds to a manifold
embedding (Bowman et al., 2016). In practice, most regions
of the latent space do not necessarily map (decode) to re-
alistic sentences. Consequently, randomly sampling latent
representations often yields nonsensical sentences. Recent
work by Bowman et al. (2016) has attempted to generate
more diverse sentences via RNN-based variational autoen-
coders. However, they did not address the fundamental
problem that the posterior distribution over latent variables
does not appropriately cover the latent space.
Another underlying challenge of generating realistic text
relates to the nature of the RNN. During inference, the
RNN generates words in sequence from previously gener-
ated words, contrary to learning, where ground-truth words
are used every time. As a result, error accumulates propor-
tional to the length of the sequence, i.e., the first few words
look reasonable, however, quality deteriorates quickly as
the sentence progresses. Bengio et al. (2015) coined this
phenomenon exposure bias. Toward addressing this prob-
lem, Bengio et al. (2015) proposed the scheduled sampling
approach. However, Huszár (2015) showed that scheduled
sampling is a fundamentally inconsistent training strategy,
in that it produces largely unstable results in practice.
The Generative Adversarial Network (GAN) (Goodfellow
et al., 2014) is an appealing and natural answer to the above
issues. GAN matches the distributions of synthetic and real
data by introducing an adversarial game between a gen-
erator and a discriminator. The GAN objective seeks to
constitute a generator, that functionally maps samples from
a given (simple) prior distribution, to synthetic data that ap-
pear to be realistic. The GAN setup explicitly seeks that the
latent representations from real data (via encoding) be dis-
tributed in a manner consistent with the specified prior (e.g.,
Gaussian or uniform). Due to the nature of adversarial train-
ing, the discriminator compares real and synthetic sentences,
rather than their individual words, which in principle should
alleviate the exposure-bias issue. Recent work (Lamb et al.,
2016) has incorporated an additional discriminator to train a
sequence-to-sequence language model that better preserves
arXiv:1706.03850v3 [stat.ML] 18 Nov 2017