X. Liu et al.
Fig. 2 Flowchart of the method presented by Neumann et al. [1]. This method introduced maximally stable extremal regions (MSERs) which
provides robustness to geometric and illumination conditions
Fig. 3 Results of scene text detection by Epshtein et al. [2]. In their
method, stroke width transform (SWT) is first introduced to distinguish
text objects from non-textual objects from cluttered backgrounds
series of conditions, eventually, the text line is formed (see
Fig. 3). Experiments show that SWT is highly efficient for
text detection. This operator can detect texts in many fonts
and languages, and it is insensitive to multi-scales and multi-
directions. Nevertheless, SWT requires many human-defined
constraints, so it may be failed in some challenging cases.
Yin et al. [3] developed MSER-based methods. They first
extracted character candidates by the proposed MSERs prun-
ing algorithm. Second, single-link clustering algorithm was
adopted to cluster the character candidates into text candi-
dates. Then, they trained a character classifier to eliminate
non-text candidates. Finally, an AdaBoost classifier was used
to detect text. However, there is room for further progress in
detecting multi-orientation, multi-language or highly blurred
texts in lower-resolution natural scene images.
The method proposed by Neumann et al. [4] treats the
character detection problem as an efficient sequential selec-
tion from the set of extremal regions (ERs). This method takes
up less memory, computes faster and maintains real-time per-
formance. Similarly based on extremal regions (ERs), Cho
et al. [5] presented an effective algorithm that can detect
various texts. The algorithm extracted character candidates
by extremal regions (ERs), and non-maximum suppression
(NMS) was used to guarantee the uniqueness and compact-
ness. In addition, double threshold and hysteresis tracking
was adopted to fully detect texts even the candidates with
low confidence. This method achieves high recall rate but is
computationally expensive.
An efficient stroke detector was proposed by Busta et
al. [6]. There are mainly three contributions. Firstly, stroke
ending keypoint (SEK) and stroke bend keypoint (SBK)
were introduced to detect stroke keypoint and then exploited
to produce stroke segmentations. Secondly, they trained an
AdaBoost-based classifier to classify text fragment and back-
ground clutter. Finally, based on text direction voting, they
adopted a text clustering technique to group individual char-
acters into text lines. It is worth noting that, besides computes
fast, this method is scale- and rotation-invariant and supports
a wide variety of scripts and fonts. However, it may be failed
in some challenging cases, such as low image contrast, com-
pact character.
3.1.2 Texture-based methods
The idea behind the texture-based method is that text in image
has distinct textural properties, which can distinguish them
123