深度学习驱动的场景文本识别技术概览

63 浏览量更新于2024-08-23 收藏 344KB PDF 举报

"基于深度学习的场景文本识别：简述" 本文是一篇研究论文，由内蒙古大学计算机科学学院的陈宇欣和邵云雪合作撰写，主要探讨了深度学习在场景文本识别领域的应用和发展。场景文本识别是计算机视觉领域的一个热点问题，与传统的文档文本识别相比，它具有更高的复杂性，包括字体、分布、背景等方面的挑战。传统光学字符识别（OCR）技术已难以应对这些新挑战。随着深度学习技术的进步，它在图像识别领域取得显著成果，进而被广泛应用于场景文本识别。关键词包括：深度学习、场景文本识别、卷积神经网络和循环神经网络。 I. 引言人类之间的信息传递和交互很大程度上依赖于文字。在现实世界中，场景文本无处不在，如路标、广告牌、屏幕显示等，因此，有效的场景文本识别技术对于自动化系统和服务具有重要意义。近年来，随着深度学习模型的发展，特别是卷积神经网络（CNN）和循环神经网络（RNN）的进步，场景文本识别的准确性和效率得到了显著提升。 II. 深度学习基础深度学习是一种模仿人脑神经网络结构的机器学习方法，尤其在图像处理和自然语言处理中表现出色。CNN擅长捕捉图像中的空间特征，而RNN则适用于处理序列数据，如文本。这两种模型在场景文本识别中发挥了关键作用。 III. CNN在场景文本识别中的应用 CNN通过多层过滤器学习图像的局部特征，对于识别文本中的字符形状和排列非常有效。结合全连接层，CNN可以对整个图像进行分类，从而识别出整个文本串。 IV. RNN与LSTM在序列建模中的角色由于文本的顺序性，RNN（尤其是长短期记忆网络LSTM）被用于捕捉上下文信息。它们能够处理变长的输入序列，适合解决场景文本的行级和词级识别问题。 V. 结合CNN与RNN的模型为了同时利用局部特征和序列信息，研究者们提出了结合CNN和RNN的模型，如CRNN（卷积循环神经网络），这种架构在文本检测和识别任务中表现优异。 VI. 进一步的研究方向尽管取得了显著进展，但场景文本识别仍面临诸多挑战，如弯曲文本识别、低分辨率图像处理、多语言识别等。未来的研究将集中在模型的鲁棒性提升、计算效率优化以及对更复杂场景的适应性。 VII. 结论深度学习为场景文本识别带来了革命性的改变，但仍然有改进的空间。随着技术的不断进步，我们可以期待更加精准和智能的文本识别系统在未来出现。这篇论文总结了深度学习在场景文本识别领域的代表性成就，为该领域的研究者提供了宝贵的参考。

Scene Text Recognition Based on Deep Learning: A Brief Survey

Yuxin Chen

College of Computer Science

Inner Mongolia University

Hohhot, China

e-mail: cyx3292@163.com

Yunxue Shao

College of Computer Science

Inner Mongolia University

Hohhot, China

e-mail: csshyx@imu.edu.cn

Abstract—Scene text recognition is a universal text recognition

technology, which has become a research hotspot in computer

field in recent years. Compared with the traditional document

text recognition, the scene text recognition is more complex in

aspect of font, distribution, background and so on. Which

makes the traditional OCR technology no longer adapt to the

new challenge. With the development of technology, deep

learning has achieved good results in the field of image

recognition. Therefore, this paper mainly summarizes the

representative achievements in scene text recognition field

based on deep learning method.

Keywords-deep learning; scene text recognition;

convolutional neural networks; recurrent neural network

I. INTRODUCTION

Text is one of the main ways of information transmission

and interaction among human beings and plays an

indispensable role in our life. In the natural scene image,

there are a lot of text information, extracting text information

from images of natural scenes can help us to understand

images better. Therefore, text recognition in natural scenes

has important theoretical research value and practical

application value.

In recent years, deep learning technology develops

rapidly and plays a leading role in the field of OCR. The

OCR technology based on deep learning has achieved

significant improvements in both the accuracy and efficiency

of text recognition. In view of this, this paper summarizes the

representative achievements in the field of scene text

recognition based on deep learning method, hoping to help

readers who are interested in deep learning and scene text

recognition.

II. B

ACKGROUND KNOWLEDGE

A. Deep Learning Theory

Deep learning is a very popular research direction in the

field of machine learning in recent years. It is a deep network

structure based on multiple hidden layers. A more abstract

high-level representation attribute categories or features are

formed by combining low-level features to discover a

distributed feature representation of the data. Deep learning

transforms the original data into higher-level and more

abstract feature expressions through a large number of non-

linear transformations. With enough combination of non-

linear transformations, deep neural networks can learn very

complex functions. Generally speaking, deep learning

requires to train a large number of data in order to make the

neural networks have good generalization ability.

B. Scene Text Recognition Process

A typical natural scene text processing mainly consists of

two parts: text detection and text recognition. The main

function of text detection is to find the text area from the

image and separate the text area from the original image. The

main function of text recognition is to recognize text on the

separated image. This paper mainly introduces text

recognition, which is usually divided into the following steps:

Preprocessing: The text area obtained by the detection

step usually are affected by some factor, for example noice.

Therefore, it is necessary to preprocess the image before text

recognition. Preprocessing usually includes the following

steps: denoising, image enhancement, and scaling.

Feature extraction: It is often difficult to achieve an ideal

results by directly recognizing words at the pixel level, so it

is necessary to define a set of features to represent the image.

Some commonly used features include edge features, stroke

features, and structural features and so on.

Recognition: Text recognition task can be considered as

classification task, each character represents a category. The

recognizer makes extracted features as input and outputs

corresponding characters or words. Common recognizers

include random forest, support vector machine, neural

network and so on.

Figure 1. Scene text recognition process

C. Research on Scene Text Recognition

In recent years, some articles about scene text recognition

methods have been published in various academic journals.

Referring to relevant references, it can be found that

recognition methods can be roughly divided into two

categories according to the algorithms used for classification:

One is based on traditional methods; The other is based on

deep learning. The two methods are described as follow:

688

2019 IEEE 11th International Conference on Communication Software and Networks

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38682518

粉丝: 3
资源: 935

深度学习驱动的场景文本识别技术概览

基于深度学习的轮胎字符识别python实现源码+项目使用说明.zip

NLP面试考点和代码实现案例.rar

置信学习简述.rar

深度学习驱动的自然场景文字识别技术研究

文本分类：机器学习模型的对比与应用

NLP深度学习探索：从序列到序列模型到无监督学习

深度学习与迁移学习：如何利用预训练模型

【深度学习框架选择】：如何挑选支持大规模部署的系统

数据增强在深度学习中的力量：提升模型泛化能力的12种方法

【Java图像处理高级应用】：掌握深度学习与图像识别的桥梁技术

最新资源