IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 17, NO. 1, JANUARY 2020 117
Hierarchical Weakly Supervised Learning for
Residential Area Semantic Segmentation in
Remote Sensing Images
Libao Zhang , Member, IEEE, Jie Ma, Xinran Lv, and Donghui Chen
Abstract— Residential-area segmentation is one of the most
fundamental tasks in the field of remote sensing. Recently, fully
supervised convolutional neural network (CNN)-based methods
have shown superiority in the field of semantic segmentation.
However, a serious problem for those CNN-based methods is that
pixel-level annotations are expensive and laborious. In this study,
a novel hierarchical weakly supervised learning (HWSL) method
is proposed to realize pixel-level semantic segmentation in remote
sensing images. First, a weakly supervised hierarchical saliency
analysis is proposed to capture a sequence of class-specific
hierarchical saliency maps by computing the gradient maps with
respect to the middle layers of the CNN. Then, superpixels
and low-rank matrix recovery are introduced to highlight the
common salient areas and fuse class-specific saliency maps with
adaptive weights. Finally, a subtraction operation between class-
specific saliency maps is conducted to generate hierarchical
residual saliency maps and fulfill residential-area segmentation.
Comprehensive evaluations with two remote sensing data sets
and comparison with seven methods validate the superiority of
the proposed HWSL model.
Index Terms— Deep learning, remote sensing, saliency analysis,
semantic segmentation, weakly supervised.
I. INTRODUCTION
T
HE semantic segmentation for residential areas, i.e., anno-
tating residential areas pixel-wisely in remote sensing
images (RSIs) [1], is a fundamental task in the field of
remote sensing. During the past few years, deep learning,
which can automatically discover problem-specific features for
a given problem, has received extensive attention in image
segmentation and object detection tasks.
In particular, convolutional neural networks (CNNs) [2]
are the most widely used deep-learning method. They are
known for their large consumption of training images to
avoid overfitting and also to improve the generalization
ability of the framework. With the development of CNNs,
the accuracy of the segmentation task has been boosted
significantly. Shelhamer et al. [3 ] built fu lly convolutional
networks (FCNs), which take the input of arbitrary size and
produce correspondingly sized output with efficient inference
Manuscript received October 29, 2018; revised February 19, 2019; accepted
March 17, 2019. Date of publication May 22, 2019; date of current version
December 27, 2019. This work wa s supported in part by the Beijing Natural
Science Foundation under Grant L182029, in part by the National Natural
Science Foundation of China under Grant 61571050, and in part by the BNU
Interdisciplinary Research Foundation for the First-Year Doctoral Candidates
under Grant BNUXKJC1801. (Corresponding author: Libao Zhang.)
The authors are with the College of Information Science and Tech-
nology, Beijing Normal University, Beijing 100875, China (e-mail:
libaozhang@163.com).
Color versions of one or more of the figures in this letter are av ailable
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LGRS.2019.2914490
and learning. Badrinarayanan et al. [4] designed a trainable
segmentation engine, which consists of an encoder network
and a corresponding decoder n etwork followed by a pixelwise
classification layer. Chen et al. [5] brought together meth-
ods from deep CNNs (DCNNs) and probabilistic graphical
models for addressing the task of p ixel-level classification.
Ronneberger et al. [6] designed a segmentation architecture
consisting of a contracting path to capture context and a
symmetric exp anding path that enables precise localization.
As those frameworks are all optimized based on the pixel-
level loss functions, their good performances depend on a large
amount of annotated data. Therefore, a common bottleneck
of those aforementioned methods is that they are operated
in a fully supervised manner, i.e., they typically require
plenty of pixel-level annotations in the training phase. The
process is inevitably expensive, laborious, and also prone to
error. Because of the complicated surface features and rich
background interference in RSIs, it is more labor-intensive to
label the RSIs pixel by pixel.
Weakly supervised annotations, in the form of bounding
boxes (approximate location) and image-level labels (whether
the input image contains objects), are much easier to acquire
compared to precise pixel-level annotations. Weakly super-
vised methods, which rely on weakly supervised annota-
tions, can therefore be viewed as the means to address
the limitation of fully supervised CNN-based approaches.
Simonyan et al. [ 7] utilized the grad ient maps to achieve the
object location task of natural scenes in a weakly supervised
way. However, the gradient saliency maps of RSIs are not as
desirable, since the grayscale of RSIs change violently.
In th is work, a novel hierarchical weakly supervised learn-
ing (HWSL) model is proposed to realize the semantic seg-
mentation with image-level annotations. Here, image-level tags
are used to train a classification CNN, which is also respon-
sible for generating the class-specific gradient hierarchical
saliency maps (CS-GHSMs) with respect to middle convo-
lutional layers. As the layers go deeper, those CS-GHSMs
can progressively capture local and g lobal salient features,
which are beneficial to segmentation tasks [1]. Then, it is
proposed to integrate m ultiscale features by fusing CS-GHSMs
with the help of superpixels and low-rank matrix recovery.
Finally, a subtraction operation between foreground and back-
ground fused saliency maps is implemented to suppress the
background.
The major contributions are as follows.
1) A novel weakly supervised sema ntic segmentation
method is proposed to generate accurate saliency maps
in RSIs by image-level labels rather than pixel-level
labels, which saves considerable labor costs.
1545-598X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.