𝐹
𝑃
5
𝑃
4
𝑃
2
𝑃
3
𝐶
𝑆
1
𝑆
𝑛−1
𝑆
𝑛
…
Progressive Scale Expansion
𝑅
Figure 2: Illustration of our overall pipeline. The left part is implemented from FPN [16]. The right
part denotes the feature fusion and the progressive scale expansion algorithm.
[19] utilized corner localization to find suitable irregular quadrangles for text instances. The de-
tection manners are evolving from horizontal rectangle to rotated rectangle and further to irregular
quadrangle. However, besides the quadrangular shape, there are many other shapes of text instances
in natural scene. Therefore, some researches began to explore curve text detection and obtained
certain results. [18] tried to regress the relative positions for the points of a 14-sided polygon. [31]
detected curve text by locating two end points in the sliding line which slides both horizontally and
vertically. A fused detector was proposed in [1] based on bounding box regression and semantic
segmentation. However, since their current performances are not very satisfied, there is still a large
space for promotion in curve text detection, and the detectors for arbitrary-shaped texts still need
more explorations.
3 Proposed Method
In this section, we first introduce the overall pipeline of the proposed Progressive Scale Expansion
Network (PSENet). Next, we present the details of progressive scale expansion algorithm, and show
how it can effectively distinguish the adjacent text instances. Further, the way of generating label
and the design of loss function are introduced. At last, we describe the implementation details of
PSENet.
3.1 Overall Pipeline
The overall pipeline of the proposed PSENet is illustrated in Fig. 2. Inspired by FPN [16], we
concatenate low-level feature maps with high-level feature maps and thus have four concatenated
feature maps. These maps are further fused in F to encode informations with various receptive
views. Intuitively, such fusion is very likely to facilitate the generations of the kernels with various
scales. Then the feature map F is projected into n branches to produce multiple segmentation
results S
1
, S
2
, ..., S
n
. Each S
i
would be one segmentation mask for all the text instances at a certain
scale. The scales of different segmentation mask are decided by the hyper-parameters which will be
discussed in Sec. 3.3. Among these masks, S
1
gives the segmentation result for the text instances
with smallest scales (i.e., the minimal kernels) and S
n
denotes for the original segmentation mask
(i.e., the maximal kernels). After obtaining these segmentation masks, we use progressive scale
expansion algorithm to gradually expand all the instances’ kernels in S
1
, to their complete shapes in
S
n
, and obtain the final detection results as R.
3.2 Progressive Scale Expansion Algorithm
As shown in Fig. 1 (c), it is hard for segmentation-based method to separate the text instances that
are close to each other. To solve this problem, we propose the progressive scale expansion algorithm.
Here is a vivid example (see Fig. 3) to explain the procedure of progressive scale expansion algo-
rithm, whose central idea is brought from the Breadth-First-Search (BFS) algorithm. In the example,
we have 3 segmentation results S = {S
1
, S
2
, S
3
} (see Fig. 3 (a), (e), (f)). At first, based on the
3