generative adversarial network (WGAN) [24], and utilising a perceptual loss in addition to the
MSE loss. In the next section, we will describe the implementation of these methods.
2.2. Perceptual loss
The perceptual loss is based on high-level features extracted from a pre-trained network [25].
This ensures that the network is trained to replicate image similarities more robustly compared
to using per-pixel losses. The perceptual loss is defined as the Euclidean distance between the
feature representations of the enhanced image (
G
θ
G
(I
R
)
) and the frame-averaged reference image
(I
F A
) given by a pre-trained VGG19 network [26].
L
V GG/i. j
=
1
W
i, j
H
i, j
W
i, j
Õ
x=1
H
i, j
Õ
y=1
(φ
i, j
(I
F A
)
x,y
− φ
i, j
(G
θ
G
(I
R
))
x,y
)
2
(2)
where,
φ
i, j
indicates the feature map obtained by the j-th convolution, after ReLU activation,
prior to the i-th pooling layer, and
W
i, j
and
H
i, j
describe the dimensions of the respective feature
maps within the VGG network.
2.3. Adversarial loss
Along with the generator network, a generative adversarial network (GAN) involves a discriminator
network,
D
θ
D
, parametrised by
θ
D
(shown in Fig. 1). The generator network is trained to
produce realistic images, while the discriminator network is trained to identify which images are
real versus those that are generated. Here, we implement a WGAN, an improved version of the
original GAN, which uses the Earth Mover’s distance [27] to compare two data distributions (that
of
I
F
A
and
I
E
). We optimise both networks in an alternating manner (fixing one and updating
the other) to solve the following min-max problem:
min
θ
G
max
θ
D
L
WGAN
(D, G) = −E
I
F A
[D(I
F A
)] + E
I
R
[D(G(I
R
))]
+λE
d
I
F A
[(k∆
I
F
A
D(
d
I
F A
)k
2
− 1)
2
],
(3)
where, the first two terms represent the estimation of the Wasserstein distance, and the final
term performs a gradient penalty to enforce the Lipschitz constraint, with penalty coefficient
λ
.
c
I
E
is uniformly sampled along pairs of
I
E
and
I
F A
samples. This results in improved stability
during training. Additionally, we impose gradient penalty [28], which has been shown to improve
convergence of the WGAN compared to gradient clipping. With this approach, our generator can
learn to create solutions that are highly similar to real images and thus difficult to classify by D.
Thus, the overall loss of the CNN-WGAN architecture is given by:
min
θ
G
max
θ
D
λ
1
L
WGAN
(D, G) + λ
2
L
V GG
(G) + L
MSE
(G), (4)
where
λ
1
and
λ
2
are weighting parameters to control the trade-off between the three components
of the loss.
3. Experiments
3.1. Data acquisition and pre-processing
Six OCT volumes were acquired from both eyes of 38 healthy patients on a Cirrus HD-OCT
Scanner (Zeiss, Dublin, CA) at a single visit. The scans were centred on the optic nerve head
(ONH) and were 200 x 200 x1024 voxel per cube, acquired from a region 6mm x 6mm x 2mm.
These scans were then registered and averaged to create the “ground truth" denoised image. The
scan with the highest signal strength (as provided by the scanner software) was chosen as the
Vol. 9, No. 12 | 1 Dec 2018 | BIOMEDICAL OPTICS EXPRESS 6208