Pattern Recognition Letters 150 (2021) 258–264
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier.com/locate/patrec
Soft-Boundary Label Relaxation with class placement constraints for
semantic segmentation of the railway environment
Yuki Furitsu
a , ∗
, Daisuke Deguchi
a
, Yasutomo Kawanishi
a
, Ichiro Ide
a
, Hiroshi Murase
a
,
Hiroki Mukojima
b
, Nozomi Nagamine
b
a
Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464–8601, Japan
b
Railway Technical Research Institute, 2-8-38 Hikari-cho, Kokubunji-shi, Tokyo, 185–8540, Japan
a r t i c l e i n f o
Article history:
Received 18 January 2021
Revised 25 May 2021
Accepted 4 July 2021
Available online 27 July 2021
Edited by Prof. S. Sarkar
Keywords:
Semantic segmentation
Railway
Label relaxation
a b s t r a c t
In this paper, we focus on the challenging task of the semantic segmentation of train front-view images.
Managing trackside facilities can be done by using detailed and precise information about the surround-
ing railway environment. Semantic segmentation enables us to understand the 2D environment, but there
is no adequate large-scale dataset available for training a CNN for this purpose. Some attempts have been
made to generate pseudo-data from unlabeled sequential frames to compensate for the lack of volume in
training data, but the moving speed of trains makes it difficult to apply them directly. We aim to solve
this problem by proposing the Soft Boundary Label Relaxation (Soft-BLR) method, which considers la-
bel boundaries extending over multiple pixels to cope with more severely distorted pseudo-data and to
better train the CNN in the initial training stage. Furthermore, we modify the loss function to penalize
inference results based on the distance from the label boundary to solve the misalignment problems of
border pixels. Through experimental evaluation, we report that the proposed method outperforms pre-
vious methods on not only the semantic segmentation of challenging railway images, but also that of
general street-view images.
© 2021 Elsevier B.V. All rights reserved.
1. Introduction
Railways are valued means of transportation due to their speed,
capacity, and reliability, and their extension reaches a total of
more than a million kilometers around the Globe. To cope with
such characteristics, railway operators especially emphasize on the
safety and the prevention of accidents. From simple railway sig-
nals to more advanced Automatic Train Stop (ATS) systems, various
technologies are used to ensure the safety of passengers. However,
the collection of geological / geometrical positions and the types
of trackside facilities are currently done manually with high hu-
man cost, and some railway operators are even unaware of where
and what trackside facilities exist along their tracks due to manag-
ing problems within different departments. Daily maintenance of
such facilities is also essential, yet it is still being done by man-
ual and/or visual inspection. Therefore, a fully automatic technol-
ogy that can collect data about trackside facilities and can be used
for their maintenance is in crucial need for railway operators. To
∗
Corresponding author.
E-mail address: furitsuy@murase.is.i.nagoya-u.ac.jp (Y. Furitsu).
meet such needs, the use of semantic segmentation for railway en-
vironment understanding is currently being considered.
Semantic segmentation, a task of allocating a single semantic
label to each and every pixel within an image, can be used to
understand the surrounding environment in detail. Almost every
modern method of semantic segmentation utilizes Convolutional
Neural Network (CNN), and thus requires supervised data [3] . For
this reason, building an adequate dataset for semantic segmenta-
tion is a substantial issue. A typical pixel-level manual annota-
tion of an image takes more than an hour [4] . Training a CNN
model generally requires a massive volume of training data, and
constructing a large-scale dataset for every application is unrealis-
tic. Although domain adaptation has been studied to transfer train-
ing results to similar domains, such as synthetic to real-world data
[10] , this approach cannot be applied across dissimilar domains
like from street environment to the railway environment.
To cope with such lack of sufficient training data, Zhu et al.
[18] originally proposed joint image-label propagation to generate
pseudo-data using a small number of labeled images and neighbor-
ing sequential unlabeled images for the street-view image domain.
They also introduced Boundary Label Relaxation (BLR) to cope with
distorted training data generated by joint image-label propagation.
In joint image-label propagation, pseudo-data are generated by
https://doi.org/10.1016/j.patrec.2021.07.014
0167-8655/© 2021 Elsevier B.V. All rights reserved.