504 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 4, NO. 4, AUGUST 2020
Fig. 2. The schematic diagram of the loss function in the loss-sensitive GAN.
Δ(x, G(z)) represents the distance between x and G(z), which can be a p-
Norm. With the iteration of the generator G, the similarity between the generated
samples and the real samples will increase. can be more reasonably used to
address the vanishing gradient problem.
a infinite modeling ability, so there is no restriction to the
distribution of real samples, which may lead to the vanishing
gradient problem. The loss-sensitive GAN learns a loss function
L
θ
(x) parameterized with θ, which can restrict the modeling
ability of discriminator D. The loss function is learned by a
data-dependent margin because they assumed that the loss of
real data distribution should be smaller than the generated one.
In this way, the loss-sensitive GAN has proved that the Lipschitz
densities of real samples are similar to the densities of generated
samples. The objective function of loss-sensitive GAN is defined
in Eqs. (13) and (14),
min
D
V (D)=E
x∼p
data
(x)
L
θ
(x)
+ λE
x ∼ p
data
(x)
z ∼ p
z
(z)
Δ(x, G(z)) + L
θ
(x) − L
θ
(G(z))
,
(13)
min
G
V (G)=E
z∼p
z
(z)
L
θ
(G(z)) , (14)
where Δ(x, G(z)) represents the margin measuring the differ-
ences between x and G(z) in terms of their losses, and λ is a
balancing parameter. Fig. 2 illustrates this idea in details. I t can
be seen from Fig. 2, if the generated data distribution is close to
the real one, it is no longer treated as a negative sample, and more
efforts can be concentrated to improve the samples that are far
away from the real samples. It should be noted that the margin is
not a fixed constant. The margin is a similarity function defined
in specific experiments such as p-Norm. When the generated
sample is very close to the real one in a metric space, the margin
will be vanished.
(b) Maximum mean discrepancy (MMD)
Assuming that χ is a non-empty metric space, and a class of
function f ∈F: χ → R, X ∼ p, Y ∼ q, the maximum mean
discrepancy (MMD) [25] between p and q is defined in Eq. (15),
MMD[F,p,q]=sup
f∈F
Ef(X) − Ef(Y ). (15)
The reproducing kernel Hilbert space (RKHS) H is an infinite
dimensional space. For each f ∈H, there exists a kernel k ∈H,
then f is formulated as,
f(x)=
f,k (·,x)
H
=
α
i
k(x, x
i
). (16)
If F chooses RKHS space H, μ
p
represents mean embedding
of p, which is calculated as follows,
μ
p
=
χ
k(x, ·)p(dx) ∈H. (17)
For each f ∈H, E[f(X)] = f, μ
p
H
, the MMD can be formu-
lated as another mean feature matching, as shown in Eq. (18),
MMD[F,p,q]= sup
fH≤1
E
p
f(x) − E
q
f(y)
=sup
fH≤1
E
p
φ(x),f
sf
− E
q
φ(y),f
H
=sup
fH≤1
E
p
μ
p
− μ
q
,f
H
= μ
p
− μ
q
H
,
(18)
where φ(·) represents x ∈H, μ
p
represents E
p
[φ(x)] and μ
q
represents E
q
[φ(y)]. The MMD was firstly proposed for the
problem of two-sample test to determine the differences between
two distributions p and q. In practical applications, the square
of the MMD is generally used, and it is defined as,
MMD[F,p,q]=μ
p
− μ
q
,μ
p
− μ
q
H
= μ
p
,μ
p
+ μ
q
,μ
q
−2μ
p
,μ
q
= E
p
φ(x),φ(x
)
H
− 2E
p,q
φ(x),φ(y)
H
+ E
q
φ(y),φ(y
)
H
.
(19)
We can use kernel tricks to measure a kernel function k(x, y).
The choice of the kernel function is various, such as linear kernel,
Gaussian kernel, Laplacian kernel, etc.
Based on the fixed Gaussian kernel k(x, y)=exp(−x −
y
2
),Liet al. [18] proposed the generative moment matching
networks (GMMN) to measure the discrepancy of two distri-
butions in GANs by minimizing the MMD distance. Unlike
regular GANs, the GMMN used an autoencoder instead of a
discriminator to estimate the discrepancy between two distri-
butions. During the training process, although the stability of
the generated samples is improved, the training efficiency of
GMMN is not satisfactory. To achieve improvements in the
generalization ability and computational efficiency of GMMN,
the MMDGAN [19] replaced the static fixed Gaussian kernels
with the adversarial learned kernels. The adversarial learned
kernel consists of a Gaussian kernel and an injective function
f
φ
, where k(x, y) = exp(−f
φ
(x) − f
φ
(y)
2
). In addition, in
order to enforce f
φ
to be an injective function, they used an
autoencoder in the discriminator.
3) Other objective function methods: In addition to using
Lipschitz density to constrain the sample distribution, non-
probability forms can also be used to measure GANs. Energy-
based GAN (EBGAN) [20] is a typical one in this form. Unlike
the discriminator used in the regular GANs, the discriminator
Authorized licensed use limited to: Nanjing University of Information Science and Technology. Downloaded on August 05,2020 at 03:32:02 UTC from IEEE Xplore. Restrictions apply.