深度学习驱动的Faster RCNN场景文本识别算法优化

93 浏览量更新于2024-08-25 收藏 303KB PDF 举报

本文主要探讨了基于快速区域卷积神经网络（Faster R-CNN）的场景文本识别算法，针对传统光学字符识别（OCR）技术在工业环境中面临的问题进行了创新性研究。工业环境中的自然场景文本识别通常要求较高的识别准确性和对复杂背景、不规则布局的适应性，而传统的OCR方法往往难以满足这些标准。快速RCNN是一种深度学习框架，它结合了区域提议网络（RPN）与Fast R-CNN的优点，能够高效地进行目标检测和分类，这对于文本区域的定位和识别至关重要。在文本识别任务中，该算法通过卷积神经网络的强大特征提取能力，能够从图像中捕获丰富的纹理和结构信息，进而提高识别精度。首先，文章提出了一种新的深度学习方法，将Faster R-CNN应用于场景文本识别。这种方法通过端到端的学习过程，可以自动学习和优化文本区域的候选区域，无需预先设定复杂的规则或手动设计特征。相较于传统的基于模板匹配或特征工程的方法，这种方法更加灵活且鲁棒，能够在一定程度上抵消背景噪声和不同字体、大小的文本变化。论文的创新点在于： 1. **深度学习驱动**：算法利用深度神经网络的自适应学习能力，能够更好地处理各种复杂场景下的文本，如倾斜、遮挡、变形等问题。 2. **目标检测增强**：通过RPN生成高质量的文本区域提议，减少了误检和漏检的可能性。 3. **效率与准确性**：Faster R-CNN在保证高精度的同时，也实现了相对较高的识别速度，满足实时工业应用的需求。 4. **工业场景适用**：算法特别针对工业环境中常见的文本识别挑战进行了优化，提高了识别在实际生产场景中的稳定性和实用性。这篇论文提出了一种有效提升工业场景下场景文本识别性能的深度学习方法，其核心是基于Faster R-CNN的文本检测与识别技术。通过实验证明，该算法在面对复杂工业环境中的文本识别任务时，具有显著的优势，有望推动OCR技术在工业领域的广泛应用。

展开

Scene Text Recognition Algorithm Based on Faster

RCNN

Boya Wang ,Jianqing Xu,Junbao Li

Department of Automatic Test and

Control, Harbin Institute of Technology

Harbin , China

Email:wangboya2016@163.com

Cong Hu

School of Electronic Engineering and

Automation, Guilin University of

Electronic Technology

Guilin 541004, China

Email:hucong@guet.edu.cn

Jeng-Shyang Pan

Fujian Provincial Key Lab of Big Data

Mining and Applications, Fujian

University of Technology

Fuzhou 350108,China

Email:Chinajspan@cc.kuas.edu.tw

corresponding author

Abstract—Industrial session of the natural scene in the text

recognition technology has a great demand. The traditional

optical character recognition technology (OCR) requires the text

neat layout and neatness and background clean, and industrial

production often fail to meet such standards. In this paper, a new

text recognition algorithm based on deep learning is proposed for

the existing problems of OCR technology. In this paper, a new

method based on convolution neural network (Faster RCNN) is

proposed to improve the correctness of text recognition.

Compared with the conventional detection method, the correct

rate of recognition based on Faster RCNN model can reach

90.4%, and the correctness rate is 88.9%. Experiments show that

the recognition method in this paper is effective.

Keywords—deep learning ; scene text recognition ;convolution

neural network

I. INTRODUCTION

Depth learning is a new field in machine learning research.

By establishing a hierarchical model structure similar to the

human brain, the input data is extracted to the upper level step

by step, so that the mapping from the underlying signal to the

high level semantics can be well established relationship

[5]

. In

recent years, artificial intelligence has been widely applied to

industrial production, gradually replace the artificial, a new

direction for industrial automation. In the field of industrial

production, how to identify the workpiece to become an

important issue.

The text recognition in the scene image of the industrial

production is different from the text recognition in the

document. The text in the document is generally arranged

neatly, the background color is single text, and the document

recognition rate has reached the requirement of using. The

factory production, the image background is more complex,

layout and messy characters may appear distorted, so the

traditional OCR can not meet their requirements

[6]

. So this

paper presents a scene recognition algorithm based on the

depth of learning, through the convolution neural network to

extract character features for character recognition

[10]

. And

through the pre-training model to improve the accuracy.

Experiments show that this method for some of the more

difficult to identify the characters have a better recognition

effect.

II. FASTER RCNN TEXT RECONGNITION PRINCIPLE

A. Faster - RCNN convolution neural network model

Unlike the traditional feature extraction method, the

convolution neural network extracts features by convolution

kernel, each neuron and the local receptive field of the previous

layer. Each neuron is connected to the local sensory region of

the previous layer and is calculated by convolution kernel

Local features. The motion of the convolution window creates

a feature plane, and each feature plane shares a convolution

kernel, which makes weight sharing and reduces the number of

weights. The Faster RCNN network is mainly used to identify

two-dimensional images, because the shared weights are

obtained by means of supervised learning

[1]

. So the artifacts are

avoided, so the Faster RCNN has the advantage of learning to

share weights from the training data. Usually Faster RCNN

network is divided into multiple layers, one of which is called

convolution layer, one called pool layer, convolution layer and

pool layer can have multiple

[1]

. Respectively, for the extraction

of features and the processing of characteristic parameters.

Faster RCNN is a neural network model based on CNN

convolution network proposed by Shaoqing Ren et al

[1]

. It is

widely used in image recognition and other fields. The Faster

RCNN contains the RPN and the Fast RCNN network. The

basic idea of the RPN network is to distinguish all possible

candidate boxes on the extracted feature graphs. RPN network

structure shown in Fig.1., the original feature extraction

contains a number of layers conv + relu, classification layer

(cls_score) output at each location, 9 anchor belongs to the

foreground and background probability; window regression

layer (bbox_pred) output each location On, the nine anchor

corresponding windows should shift the scaled parameters.

RPN is naturally implemented as a full convolution

network, through reverse propagation and random gradient

drop (SGD) end-to-end training

[1]

. Follow the "image-centric"

sampling strategy to train this network. Each mini-batch

consists of a single image that contains many positive and

negative samples. We can optimize the loss function of all

anchors, but this will tend to negative samples because they are

下载后可阅读完整内容，剩余4页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38534352

粉丝: 5

深度学习驱动的Faster RCNN场景文本识别算法优化

基于CNN的中文文本分类算法（可应用于垃圾邮件过滤、情感分析等场景）

east文本检测+rcnn文本识别，python开发，只依赖OPENCV

EAST+RCNN文本检测识别技术，结合Python与OpenCV

基于Pytorch开发的DbNet+AngleNet+Rcnn的OCR识别神经网络。用于安卓端

基于FasterRCNN目标检测的缺陷检测算法 数据集包含五类别（具体如图所示） 共计1800张图 包含VOC格式数据集+Fas

基于Pytorch开发的DbNet+AngleNet+Rcnn的OCR识别神经网络 用于Android端OCR识别及作为Auto.js Android开发框架的产品扩展插件 .zip

自然场景下文字识别（EAST AND RCNN(CTC)）

基于MATLAB RCNN的汽车目标深度学习检测

文本识别与对象信息的融合算法：提升自然场景下的性能

文本检测算法进展：YOLO V3与faster-RCNN的应用解析

最新资源

基于FasterRCNN目标检测的缺陷检测算法数据集包含五类别（具体如图所示）共计1800张图包含VOC格式数据集+Fas

基于Pytorch开发的DbNet+AngleNet+Rcnn的OCR识别神经网络用于Android端OCR识别及作为Auto.js Android开发框架的产品扩展插件 .zip