IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 12, NO. 5, MAY 2015 1037
A New Pan-Sharpening Method With
Deep Neural Networks
Wei Huang, Student Member, IEEE,LiangXiao,Member, IEEE, Zhihui Wei, Hongyi Liu, and Songze Tang
Abstract—A deep neural network (DNN)-based new pan-
sharpening method for the remote sensing image fusion problem
is proposed in this letter. Research on representation learning
suggests that the DNN can effectively model complex relationships
between variables via the composition of several levels of nonlin-
earity. Inspired by this observation, a modified sparse denoising
autoencoder (MSDA) algorithm is proposed to train the relation-
ship between high-resolution (HR) and low-resolution (LR) image
patches, which can be represented by the DNN. The HR/LR image
patches only sample from the HR/LR panchromatic (PAN) images
at hand, respectively, without requiring other training images.
By connecting a series of MSDAs, we obtain a stacked MSDA
(S-MSDA), which can effectively pretrain the DNN. Moreover, in
order to better train the DNN, the entire DNN is again trained by
a back-propagation algorithm after pretraining. Finally, assuming
that the relationship between HR/LR multispectral (MS) image
patches is the same as that between HR/LR PAN image patches,
the HR MS image will be reconstructed from the observed LR MS
image using the trained DNN. Comparative experimental results
with several quality assessment indexes show that the proposed
method outperforms other pan-sharpening methods in terms of
visual perception and numerical measures.
Index Terms—Deep neural networks (DNNs), multispectral
(MS) image, panchromatic (PAN) image, pan-sharpening.
I. INTRODUCTION
E
ARTH observation satellites such as IKONOS and
QuickBird provide two different types of images: a
panchromatic (PAN) image with high spatial and low spectral
resolutions and a multispectral (MS) image with high spectral
and low spatial resolutions. Due to technological limitations
of current satellite sensor, it is very difficult to acquire a
high spatial resolution MS image directly. As a postprocessing
method, pan sharpening can be employed to produce a high
spatial resolution MS image by fusing the information of the
PAN and MS images. The fusing process has become a key
Manuscript received September 25, 2014; revised November 4, 2014; ac-
cepted November 21, 2014. Date of publication January 22, 2015; date of
current version February 5, 2015. This work was supported in part by the
National Natural Science Foundation of China under Grant 11431015, Grant
61301215, Grant 61101194, and Grant 61171165; by the National Scientific
Equipment Developing Project of China under Grant 2012YQ050250; and by
the Jiangsu Provincial Postdoctoral Research Funding Plan of China under
Grant 1301025C.
W. Huang, L. Xiao, Z. Wei, and S. Tang are with the School of Computer
Science and Engineering, Nanjing University of Science and Technology,
Nanjing 210094, China (e-mail: hnhw235@163.com; xiaoliang@mail.njust.
edu.cn; gswei@mail.njust.edu.cn; ts198708@163.com).
H. Liu is with the School of Science, Nanjing University of Science and
Technology, Nanjing 210094, China (e-mail: hyliu@njust.edu.cn).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LGRS.2014.2376034
preprocessing step in many remote sensing applications such as
feature detection and land-cover classification [1].
During the last decades, various pan-sharpening methods
have been proposed to address the problem of remote sensing
image fusion. Initial methods are the case of the component-
substitution methods, which mainly include the intensity–
hue–saturation (IHS) [2], [3], the principal component analysis
(PCA) [4], and the Gram–Schmidt (GS) transform [5]-based
methods. These methods can achieve high spatial r esolution
but severe spectral distortion. On the contrary, multi-resolution-
analysis-based wavelet transform methods [6] can preserve
good spectral information. The most famous one among them
is the ˙a trous wavelet transform (ATWT) method [7], which
is simple but robust. However, these methods may suffer from
significant spatial distortion.
Compressive sensing (CS)-based pan-sharpening methods
have gained general acceptance in recent years. Li and Yang [8]
first proposed a CS-based pan-sharpening method, which has
achieved a great success. The s hortcoming of the method is
that it needs plenty of high-resolution (HR) MS training im-
ages, which may be nonavailable. To deal with this problem,
Jiang et al. [9] constructed a joint dictionary from upsam-
pled low-resolution (LR) MS and HR PAN training images.
However, they still need to collect numerous LR MS and HR
PAN image pairs as the training set. Li et al. [10] proposed a
pan-sharpening method over learned dictionary without a train-
ing set. However, the three dictionaries for PAN and HR/LR
MS images must be constructed, which will lead to expen-
sive computation. Zhu and Bamler [11] proposed a new pan-
sharpening method named SparseFI, which explores the sparse
representation of MS image patches in a dictionary trained only
from the PAN image without training images. These methods
have difficulty in choosing dictionary atoms when the structural
information is weak or lost. To overcoming this problem, a two-
step sparse coding pan-sharpening method is proposed [12].
Because the CS-based methods assumed that a sparse signal
can be represented as a linear combination of a few atoms in
an overcomplete dictionary, they only share a shallow linear
structure.
Recent research has shown that the nonlinear deep neural
networks (DNNs) have significant great representational power
for complex structures and have obtained superior performance
in the field of image processing. For example, the great success
had been achieved in image denoising and blind inpainting by
combining sparse coding and DNN pretrained with denoising
autoencoders (DAs) [13]. Based on [13], Agostinelli et al. [14]
presented the adaptive multicolumn stacked sparse DA (SDA)
method, which can achieve state-of-the-art denoising per-
formance with a single system on a variety of different
noise types.
1545-598X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.