Scene Text Recognition Algorithm Based on Faster
RCNN
Boya Wang ,Jianqing Xu,Junbao Li
Department of Automatic Test and
Control, Harbin Institute of Technology
Harbin , China
Email:wangboya2016@163.com
Cong Hu
School of Electronic Engineering and
Automation, Guilin University of
Electronic Technology
Guilin 541004, China
Email:hucong@guet.edu.cn
Jeng-Shyang Pan
*
Fujian Provincial Key Lab of Big Data
Mining and Applications, Fujian
University of Technology
Fuzhou 350108,China
Email:Chinajspan@cc.kuas.edu.tw
*
corresponding author
Abstract—Industrial session of the natural scene in the text
recognition technology has a great demand. The traditional
optical character recognition technology (OCR) requires the text
neat layout and neatness and background clean, and industrial
production often fail to meet such standards. In this paper, a new
text recognition algorithm based on deep learning is proposed for
the existing problems of OCR technology. In this paper, a new
method based on convolution neural network (Faster RCNN) is
proposed to improve the correctness of text recognition.
Compared with the conventional detection method, the correct
rate of recognition based on Faster RCNN model can reach
90.4%, and the correctness rate is 88.9%. Experiments show that
the recognition method in this paper is effective.
Keywords—deep learning ; scene text recognition ;convolution
neural network
I. INTRODUCTION
Depth learning is a new field in machine learning research.
By establishing a hierarchical model structure similar to the
human brain, the input data is extracted to the upper level step
by step, so that the mapping from the underlying signal to the
high level semantics can be well established relationship
[5]
. In
recent years, artificial intelligence has been widely applied to
industrial production, gradually replace the artificial, a new
direction for industrial automation. In the field of industrial
production, how to identify the workpiece to become an
important issue.
The text recognition in the scene image of the industrial
production is different from the text recognition in the
document. The text in the document is generally arranged
neatly, the background color is single text, and the document
recognition rate has reached the requirement of using. The
factory production, the image background is more complex,
layout and messy characters may appear distorted, so the
traditional OCR can not meet their requirements
[6]
. So this
paper presents a scene recognition algorithm based on the
depth of learning, through the convolution neural network to
extract character features for character recognition
[10]
. And
through the pre-training model to improve the accuracy.
Experiments show that this method for some of the more
difficult to identify the characters have a better recognition
effect.
II. FASTER RCNN TEXT RECONGNITION PRINCIPLE
A. Faster - RCNN convolution neural network model
Unlike the traditional feature extraction method, the
convolution neural network extracts features by convolution
kernel, each neuron and the local receptive field of the previous
layer. Each neuron is connected to the local sensory region of
the previous layer and is calculated by convolution kernel
Local features. The motion of the convolution window creates
a feature plane, and each feature plane shares a convolution
kernel, which makes weight sharing and reduces the number of
weights. The Faster RCNN network is mainly used to identify
two-dimensional images, because the shared weights are
obtained by means of supervised learning
[1]
. So the artifacts are
avoided, so the Faster RCNN has the advantage of learning to
share weights from the training data. Usually Faster RCNN
network is divided into multiple layers, one of which is called
convolution layer, one called pool layer, convolution layer and
pool layer can have multiple
[1]
. Respectively, for the extraction
of features and the processing of characteristic parameters.
Faster RCNN is a neural network model based on CNN
convolution network proposed by Shaoqing Ren et al
[1]
. It is
widely used in image recognition and other fields. The Faster
RCNN contains the RPN and the Fast RCNN network. The
basic idea of the RPN network is to distinguish all possible
candidate boxes on the extracted feature graphs. RPN network
structure shown in Fig.1., the original feature extraction
contains a number of layers conv + relu, classification layer
(cls_score) output at each location, 9 anchor belongs to the
foreground and background probability; window regression
layer (bbox_pred) output each location On, the nine anchor
corresponding windows should shift the scaled parameters.
RPN is naturally implemented as a full convolution
network, through reverse propagation and random gradient
drop (SGD) end-to-end training
[1]
. Follow the "image-centric"
sampling strategy to train this network. Each mini-batch
consists of a single image that contains many positive and
negative samples. We can optimize the loss function of all
anchors, but this will tend to negative samples because they are