无分割蒙古传统OCR：序列到注意力模型的高效识别

13 浏览量更新于2024-08-26 收藏 197KB PDF 举报

本研究论文关注于蒙古族传统印刷文本的光学字符识别（Optical Character Recognition, OCR）技术，特别是在解决蒙古文字图像分割难题上的创新方法。当前的蒙古OCR系统普遍依赖于图像分割步骤，然而蒙古文的字形结构复杂，分割过程相对困难，导致这类方法成本高昂且易出错。作者提出了一个基于序列到序列（Sequence-to-Sequence, Seq2Seq）模型并集成注意力机制的无分割解决方案。在论文中，作者将OCR任务重新定义为序列到序列映射问题，即输入的蒙古文单词图像被视为一系列连续的图像帧，而输出的文本字符串则视为一系列字母的序列。这种转变摒弃了传统的逐字符或逐像素的分割过程，转而利用深度学习框架中的Seq2Seq模型，该模型能够捕捉输入和输出之间的长期依赖关系，并通过注意力机制动态聚焦于输入的关键部分，从而提高识别精度。注意力机制使得模型能够根据不同部分的重要性给予不同的权重，这对于处理像蒙古文这样的多变字符语言至关重要。这种方法的优势在于它能够处理不同形状和大小的蒙古字母，同时减少了由于分割不精确带来的错误可能性。实验结果表明，提出的无分割蒙古传统OCR方法在处理印刷文本时表现出较高的识别效率和准确性。相比于现有基于分割的方法，新方法不仅简化了流程，降低了计算成本，还提高了识别性能，对于蒙古文化资源的数字化和利用具有重要意义。总结来说，这篇研究论文的核心贡献在于提出了一种创新的OCR技术，利用序列到注意力模型解决了蒙古文印刷文本的识别问题，这在推动蒙古族文化数字化和降低OCR技术应用门槛方面具有重大价值。

Segmentation-free Printed Traditional

Mongolian OCR Using Sequence to Sequence

with Attention Model

Hui Zhang, Hongxi Wei*, Feilong Bao, Guanglai Gao

School of Computer Science

Inner Mongolia University

Hohhot, China

Email(*): cswhx@imu.edu.cn

Abstract—Mongolian Optical Character Recognition (OCR)

systems are required for printed document digitization and

Mongolian cultural resources utilization. Existing Mongolian

OCR systems are based on segmentation. But, the Mongolian

segmentation is more difﬁcult than other languages. So, these

methods are highly costly and error suffering. In this study, a

segmentation-free based traditional Mongolian word recognition

method is proposed. Speciﬁcally, we formalize the OCR task

as a sequence to sequence mapping problem, in which the

input Mongolian word image and the output textual string are

treated as a sequence of image frames and a sequence of letters,

respectively. A sequence to sequence with attention model is

adopted to solve this problem. Experimental results on a dataset

show the effectiveness of the proposed method.

Keywords-Mongolian; Optical character recognition (OCR);

Sequence to sequence; Attention; LSTM

I. INTRODUCTION

Mongolian is one of the major ethnic languages in China.

About 6 million people use Mongolian all over the world

[1]. Mongolian Optical Character Recognition (OCR) systems

are required for printed document digitization and Mongolian

cultural resources utilization.

Traditional Mongolian language has a unique writing

style which is quite different from Chinese and English, as

illustrated in Fig. 1. Firstly, its writing order is vertical from

top to bottom and the column order is from left to right.

Secondly, all letters of one Mongolian word are conglutinated

together in the vertical direction to form a backbone.

Thirdly, letters have initial, medial or ﬁnal presentation forms

according to their positions within a word.

Most OCR methods presented for Latin or Chinese

characters assume that individual characters can be easily

isolated. But it is not true for the cursive Mongolian script.

Various segmentation methods which are usually based on

projection, backbone analysis, and word contour tracing have

been proposed to segment Mongolian words into individual

Fig. 1. A sample of the traditional Mongolian text.

characters [2] [3] [4]. However, the segmentation is quite

costly and also increases the chances of errors.

As far as our knowledge, all of the existing Mongolian OCR

systems are segmentation-based, which need a segmentation

step, before identifying the detected characters with a

classiﬁer. In this paper, we proposed a segmentation-free

Mongolian OCR system, which recognizes the traditional

Mongolian words directly. Traditional Mongolian language

has a very large vocabulary, daily used vocabulary is about

0.1 to 1 million. Training a classiﬁer to distinguish every

word is a nightmare. And out-of-vocabulary words cannot

be recognized in this way. To make the training more easily

and overcome the out-of-vocabulary issues, we formalize the

OCR as a sequence to sequence mapping problem, which

treats the input word image as a sequence of image frames

and treats the output word as a sequence of letters. The

model is trained to obtain the relationship between letters and

glyphs. It recognizes each letter and then concatenates them

into words.

The recognition model consists of two Long Short-term

Memories (LSTMs). The ﬁrst one is an encoder network

to consume the input image frame sequences. The second

one is a decoder network to generate output texts. Attention

connections are added from the decoder to the encoder. These

attention mechanisms can improve parallelism and decrease

2017 14th IAPR International Conference on Document Analysis and Recognition

DOI 10.1109/ICDAR.2017.101

585

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38637918

粉丝: 9
资源: 946

无分割蒙古传统OCR：序列到注意力模型的高效识别

注意力模型Python程序

基于隐马尔可夫模型回归HMMR模型的时间序列分割处理matlab仿真+代码仿真操作视频

15.时间序列预测（LSTM模型）python代码实现

Transformer: 基于注意力的高效序列模型

Transformer模型：注意力机制重塑序列转换

PyTorch实现序列到序列模型的安装与使用指南

时间序列预测与回归模型详解

MATLAB实现时间序列分析与ARMA模型生成

时间序列分析：线性模型拟合与人口预测

R语言时间序列分析：ARIMA模型入门指南

最新资源