Segmentation-free Printed Traditional
Mongolian OCR Using Sequence to Sequence
with Attention Model
Hui Zhang, Hongxi Wei*, Feilong Bao, Guanglai Gao
School of Computer Science
Inner Mongolia University
Hohhot, China
Email(*): cswhx@imu.edu.cn
Abstract—Mongolian Optical Character Recognition (OCR)
systems are required for printed document digitization and
Mongolian cultural resources utilization. Existing Mongolian
OCR systems are based on segmentation. But, the Mongolian
segmentation is more difficult than other languages. So, these
methods are highly costly and error suffering. In this study, a
segmentation-free based traditional Mongolian word recognition
method is proposed. Specifically, we formalize the OCR task
as a sequence to sequence mapping problem, in which the
input Mongolian word image and the output textual string are
treated as a sequence of image frames and a sequence of letters,
respectively. A sequence to sequence with attention model is
adopted to solve this problem. Experimental results on a dataset
show the effectiveness of the proposed method.
Keywords-Mongolian; Optical character recognition (OCR);
Sequence to sequence; Attention; LSTM
I. INTRODUCTION
Mongolian is one of the major ethnic languages in China.
About 6 million people use Mongolian all over the world
[1]. Mongolian Optical Character Recognition (OCR) systems
are required for printed document digitization and Mongolian
cultural resources utilization.
Traditional Mongolian language has a unique writing
style which is quite different from Chinese and English, as
illustrated in Fig. 1. Firstly, its writing order is vertical from
top to bottom and the column order is from left to right.
Secondly, all letters of one Mongolian word are conglutinated
together in the vertical direction to form a backbone.
Thirdly, letters have initial, medial or final presentation forms
according to their positions within a word.
Most OCR methods presented for Latin or Chinese
characters assume that individual characters can be easily
isolated. But it is not true for the cursive Mongolian script.
Various segmentation methods which are usually based on
projection, backbone analysis, and word contour tracing have
been proposed to segment Mongolian words into individual
Fig. 1. A sample of the traditional Mongolian text.
characters [2] [3] [4]. However, the segmentation is quite
costly and also increases the chances of errors.
As far as our knowledge, all of the existing Mongolian OCR
systems are segmentation-based, which need a segmentation
step, before identifying the detected characters with a
classifier. In this paper, we proposed a segmentation-free
Mongolian OCR system, which recognizes the traditional
Mongolian words directly. Traditional Mongolian language
has a very large vocabulary, daily used vocabulary is about
0.1 to 1 million. Training a classifier to distinguish every
word is a nightmare. And out-of-vocabulary words cannot
be recognized in this way. To make the training more easily
and overcome the out-of-vocabulary issues, we formalize the
OCR as a sequence to sequence mapping problem, which
treats the input word image as a sequence of image frames
and treats the output word as a sequence of letters. The
model is trained to obtain the relationship between letters and
glyphs. It recognizes each letter and then concatenates them
into words.
The recognition model consists of two Long Short-term
Memories (LSTMs). The first one is an encoder network
to consume the input image frame sequences. The second
one is a decoder network to generate output texts. Attention
connections are added from the decoder to the encoder. These
attention mechanisms can improve parallelism and decrease
2017 14th IAPR International Conference on Document Analysis and Recognition
2379-2140/17 $31.00 © 2017 IEEE
DOI 10.1109/ICDAR.2017.101
585