ZENG et al.: CDA FOR SINGLE IMAGE SR 29
given a set of samples Y = [y
1
, y
2
,...,y
N
], where y
i
∈ R
d
,
the training objective of an autoencoder is to minimize the
reconstruction error
=
i
y
i
−
ˆ
y
i
2
(1)
where y
i
and
ˆ
y
i
are the original input and the reconstructed
input, respectively. The hidden layer implies an encoding
process and a decoding process
h
i
= f
Wy
i
+ b
ˆ
y
i
= f
W
h
i
+ b
(2)
where h
i
∈ R
n
is the compact representation, W and W
rep-
resent the weight matrices for encoding and decoding layers,
b and b
denote the bias terms. f (·) is the activation function,
which we set as the sigmoid function in this paper
f (z) =
1
1 + exp(−z)
. (3)
The autoencoders can induce very useful representations
of the inputs. However, they can only handle a single sam-
ple and cannot model the relationship between a sample pair.
In image SR, we are interested in the joint task of discov-
ering suitable representations for image pairs and encoding
their relationship. We argue that a better representation should
depend on not only the input image but also the internal
relationship between the HR/LR image pairs. With this in
mind, we develop the CDA.
A. CDA
CDA has a three-stage architecture, as shown in Fig. 1.The
first and third stages employ two autoencoders for learning
the representations of LR and HR image patches, respectively.
The second stage incorporates a one-layer neural network to
transform the LR representation into the HR representation.
Following the above notations, the two autoencoders gener-
ate the hidden representations h
L
and h
H
, which we term the
intrinsic representations of the LR and HR input, respectively.
Given the LR input y
i
and the corresponding HR input x
i
,the
intrinsic representations can be obtained by
h
L
i
= f
W
1
y
i
+ b
1
(4)
h
H
i
= f
W
3
x
i
+ b
3
. (5)
For reconstruction, the decoding processes imply that
ˆ
y
i
= f
W
1
h
L
i
+ b
1
(6)
ˆ
x
i
= f
W
3
h
H
i
+ b
3
. (7)
The parameters (W
1
, W
1
, b
1
, b
1
) characterize the LR autoen-
coder (LRAE) while (W
3
, W
3
, b
3
, b
3
) parameterize the HR
autoencoder.
After obtaining the LR/HR intrinsic representations, the
neural network implements mapping from h
L
to h
H
.
Mathematically, let us denote the parameters in this stage as
(W
2
, b
2
), where W
2
is the weight matrix and b
2
is the bias
term. The mapping function then becomes
h
H
i
= f
W
2
h
L
i
+ b
2
. (8)
Algorithm 1 CDA for SR
Input: a LR image Y and well trained CDA model =
{W
1
, W
2
, W
3
, b
1
, b
2
, b
3
}
Output: the HR image
ˆ
X
Step 1: Extract low-resolution image patches y
i
using (9);
Step 2: for each image patch y
i
Step 2.1: Obtain the LR intrinsic representation h
L
i
by (4);
Step 2.2: Obtain the HR intrinsic representation h
H
i
by (8);
Step 2.3: Obtain the HR image patch
ˆ
x
i
by (7);
Step 3: Reconstruct the HR image
ˆ
X using (10).
The construction of the CDA suggests that the model is simple
and flexible. The autoencoders ensure that the intrinsic repre-
sentations are well fit to the LR and HR images and the neural
network can learn complex relationships between the LR/HR
representations; notably, the mapping function and the intrin-
sic representations are jointly optimized and thus correlated.
Therefore, the constructed architecture is a data-driven model
for single image SR. Note that we can replace autoen-
coder with the stacked autoencoder [46] or the de-noising
autoencoder [47] to obtain further performance improvement.
B. Super-Resolution by CDA
For single image SR, CDA is a three-layer forward network
employing a fast feed-forward process, as shown in Fig. 1.The
SR steps are as follows.
Following the preprocessing step found in most SR meth-
ods, a single LR image is first upscaled to the desired size
using bi-cubic interpolation. To avoid confusion, this inter-
polated LR image is denoted by Y. The LR image patches
y
i
(i = 1, 2,...,N) are obtained through
y
i
= R
i
Y (9)
where R
i
is the operator to extract the ith local patch in Y.
Taking y
i
as the input of CDA, the forward process incorpo-
rates (4), (7) and (8) to infer the LR intrinsic representation h
L
i
,
the HR intrinsic representation h
H
i
, and the final restored HR
patch
ˆ
x
i
, respectively. To estimate the whole HR image
ˆ
X,
we merge all restored patches by averaging the overlapping
regions between adjacent patches
ˆ
X =
i
R
T
i
R
i
−1
i
R
T
i
ˆ
x
i
. (10)
Algorithm 1 describes CDA for SR in detail. The dimensions
of the hidden units in each layer are discussed in Section III.
C. Training CDA
CDA needs to discover the LR/HR intrinsic representations
and simultaneously joint them using a well-trained mapping
function. For this purpose, we have designed a two-part
training procedure: the first part is initialization (stages 1–3
in Fig. 2), and the second part is the fine-tuning implemented
in stage 4.
1) Initialization: To train CDA, the intrinsic representations
of the LR/HR inputs are first generated. According to
the autoencoder introduced in the beginning of Section II,