深度学习驱动的端到端文字检测与识别：MaskTextSpotter模型

需积分: 24 21 浏览量更新于2024-07-17 1 收藏 1.52MB PDF 举报

"白翔的《端到端的文本检测与识别》是一篇关于OCR技术的论文，主要讨论了MaskTextSpotter模型，该模型是一种基于深度神经网络的场景文本检测和识别方法。" 在计算机视觉领域，特别是光学字符识别（OCR，Optical Character Recognition）中，深度学习技术的应用已经取得了显著的进展。这篇由白翔等人撰写的论文《端到端的文本检测与识别》深入探讨了一个名为MaskTextSpotter的新模型，这个模型旨在同时解决自然图像中的文本检测和识别问题，即场景文本定位（text detection）和识别（text recognition）。场景文本检测是指在复杂背景的图像中找到文本的位置，而识别则是将检测出的文本转化为可读的字符序列。论文提出了一种全新的端到端训练的神经网络模型，受到了近期发布的Mask R-CNN工作的启发。与以往也尝试用端到端训练的深度神经网络进行文本检测和识别的方法不同，MaskTextSpotter采用了简单且平滑的学习过程。这一过程保证了精确的文本检测和识别可以同时进行，而无需复杂的分步训练或后处理步骤。 Mask R-CNN是用于实例分割的一种深度学习架构，它扩展了 Faster R-CNN，引入了“掩模分支”来生成像素级分类的预测，这使得模型能够不仅检测物体，还能分割出它们的具体轮廓。MaskTextSpotter借鉴了这种思想，但将其应用于文本检测和识别，从而实现了对任意形状文本的精准定位和理解。论文中，作者们详细介绍了模型的结构、训练策略以及实验结果。他们通过一系列实验验证了MaskTextSpotter的性能，比较了与现有方法的优劣，并可能展示了在各种挑战性的数据集上的表现。这种端到端的模型对于提高OCR系统的效率和准确性具有重要意义，特别是在实际应用如自动驾驶、智能监控和文档分析等领域。这篇论文对深度学习在文本检测与识别领域的应用进行了深入研究，提出了一种新的高效模型，有助于推动OCR技术的进步，并为后续研究提供了有价值的参考。

4 Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai

2.2 Scene Text Recognition

Scene text recognition [53, 46] aims at decoding the detected or cropped image

regions into character sequences. The previous scene text recognition approaches

can be roughly split into three branches: character-based methods, word-based

methods, and sequence-based methods. The character-based recognition meth-

ods [2, 22] mostly ﬁrst localize individual characters and then recognize and

group them into words. In [20], Jaderberg et al. propose a word-based method

which treats text recognition as a common English words (90k) classiﬁcation

problem. Sequence-based methods solve text recognition as a sequence labeling

problem. In [44], Shi et al. use CNN and RNN to model image features and

output the recognized sequences with CTC [11]. In [26, 45], Lee et al. and Shi

et al. recognize scene text via attention based sequence-to-sequence model.

The proposed text recognition component in our framework can be classiﬁed

as a character-based method. However, in contrast to previous character-based

approaches, we use an FCN [42] to localize and classify characters simultaneously.

Besides, compared with sequence-based methods which are designed for a 1-D

sequence, our method is more suitable to handle irregular text (multi-oriented

text, curved text et al.).

2.3 Scene Text Spotting

Most of the previous text spotting methods [21, 30, 12, 29] split the spotting

process into two stages. They ﬁrst use a scene text detector [21, 30, 29] to localize

text instances and then use a text recognizer [20, 44] to obtain the recognized

text. In [27, 3], Li et al. and Busta et al. propose end-to-end methods to localize

and recognize text in a uniﬁed network, but require relatively complex training

procedures. Compared with these methods, our proposed text spotter can not

only be trained end-to-end completely, but also has the ability to detect and

recognize arbitrary-shape (horizontal, oriented, and curved) scene text.

2.4 General Object Detection and Semantic Segmentation

With the rise of deep learning, general object detection and semantic segmenta-

tion have achieved great development. A large number of object detection and

segmentation methods [9, 8, 40, 6, 32, 33, 39, 42, 5, 28, 13] have been pro-

posed. Beneﬁted from those methods, scene text detection and recognition have

achieved obvious progress in the past few years. Our method is also inspired

by those methods. Speciﬁcally, our method is adapted from a general object in-

stance segmentation model Mask R-CNN [13]. However, there are key diﬀerences

between the mask branch of our method and that in Mask R-CNN. Our mask

branch can not only segment text regions but also predict character probabil-

ity maps, which means that our method can be used to recognize the instance

sequence inside character maps rather than predicting an object mask only.

剩余17页未读，继续阅读

LiuZhuangCC

粉丝: 49
资源: 5

深度学习驱动的端到端文字检测与识别：MaskTextSpotter模型

端到端的中文车牌识别

EAST文本检测

深度学习CNN端到端字符识别

白翔分享：ICDAR2017 OCR深度讲座——场景文本检测与识别

白翔 ICDAR2017 OCR 讲座分享

python shapely.geometry.polygon任意两个四边形的IOU计算实例

优化短波地空网：动态NTR-TDMA协议提升性能

计算机视觉中的形状匹配方法：进展与趋势

手写数学公式识别 transformer

车牌图像中的文本检测与识别

最新资源