IncepText: 结构优化的多方向场景文本检测新模块

需积分: 14 98 浏览量更新于2024-09-12 收藏 4.72MB PDF 举报

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection 在计算机视觉应用中，多角度场景文本检测是一项极具挑战性的任务，因为文本区域通常会表现出极大的尺寸、比例和方向变化。传统的物体检测方法往往难以适应这种多样性。针对这一问题，研究者们提出了一种新颖的端到端场景文本检测器——IncepText。IncepText从实例感知分割的角度出发，特别设计了一个创新的Inception-Text模块，该模块旨在处理多方向的文本检测。 Inception-Text模块借鉴了Inception架构的思想，它通过并行处理不同尺度和特征层来捕捉文本区域的各种可能形状和大小。这种模块设计有助于提高对文本区域的识别能力，尤其是在复杂的场景中，如倾斜、扭曲或小字号的文本。为了更好地适应多方向性，研究人员引入了变形PSROI池化（Deformable PSROI Pooling），这是一种可学习的池化策略，能够自适应地调整采样窗口的位置，以捕获不同角度下的文本特征。实验结果显示，在ICDAR 2015、RCTW-17和MSRA-TD500等多个常用数据集上，IncepText展现出显著的优越性能，无论是检测效果还是效率上都优于现有方法。尤其在ICDAR 2015挑战赛中，IncepText取得了第一名的成绩，这证明了其在多方向场景文本检测领域的领先地位。此外，该方法的实例感知能力使其能够在保持高效的同时，准确地识别和定位文本，对于实际应用场景如文档分析、自动驾驶和图像搜索等具有重要意义。总结来说，IncepText是一种创新的深度学习模型，它通过结合Inception架构和变形PSROI池化技术，有效解决了多方向场景文本检测中的难题。其在多个基准测试上的卓越表现，标志着在处理文本检测任务时，考虑实例特性并针对复杂场景进行定制化处理的重要性。在未来的研究中，这种方法可能会推动更多相关领域的发展，如文本识别和自然语言理解。

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for

Multi-Oriented Scene Text Detection

Qiangpeng Yang, Mengli Cheng, Wenmeng Zhou, Yan Chen, Minghui Qiu, Wei Lin, Wei Chu

Alibaba Group

{qiangpeng.yqp, mengli.cml, wenmeng.zwm, chenyan.cy, minghui.qmh, weilin.lw, weichu.cw}@alibaba-inc.com

Abstract

Incidental scene text detection, especially for multi-

oriented text regions, is one of the most challenging

tasks in many computer vision applications. Differ-

ent from the common object detection task, scene

text often suffers from a large variance of aspect

ratio, scale, and orientation. To solve this prob-

lem, we propose a novel end-to-end scene text de-

tector IncepText from an instance-aware segmenta-

tion perspective. We design a novel Inception-Text

module and introduce deformable PSROI pooling

to deal with multi-oriented text detection. Exten-

sive experiments on ICDAR2015, RCTW-17, and

MSRA-TD500 datasets demonstrate our method’s

superiority in terms of both effectiveness and ef-

ﬁciency. Our proposed method achieves 1st place

result on ICDAR2015 challenge and the state-of-

the-art performance on other datasets. Moreover,

we have released our implementation as an OCR

product which is available for public access.

1 Introduction

Scene text detection is one of the most challenging tasks

in many computer vision applications such as multilingual

translation, image retrieval, and automatic driving. The ﬁrst

challenge is scene text contains various kinds of images, such

as street views, posters, menus, indoor scenes, etc. Further-

more, the scene text has large variations in both foreground

texts and background objects, and also with various lighting,

burring, and orientation.

In the past years, there have been many outstanding ap-

proaches focus on scene text detection. The key point of

text detection is to design features to distinguish text and

non-text regions. Most of the traditional methods such as

MSER

[

Neumann and Matas, 2010

]

and FASText

[

Busta et

al., 2015

]

use manually designed text features. These meth-

ods are not robust enough to handle complex scene text. Re-

cently, Convolutional Neural Network (CNN) based methods

achieve the state-of-the-art results in text detection and recog-

nition

[

He et al., 2016b; Tian et al., 2016; Zhou et al., 2017;

https://market.aliyun.com/products/

57124001/cmapi020020.html

He et al., 2017

]

. CNN based models have a powerful capa-

bility of feature representation, and deeper CNN models are

able to extract higher level or abstract features.

In the literature, there are mainly two types of approaches

for scene text detection, namely indirect and direct regres-

sions. Indirect regression methods predict the offsets from

some box proposals, such as CTPN

[

Tian et al., 2016

]

and

RRPN

[

Ma et al., 2017

]

. These methods are based on Faster-

RCNN

[

Ren et al., 2015

]

framework. Recently, direct regres-

sion methods have achieved high performance for scene text

detection, e.g. East

[

Zhou et al., 2017

]

and DDR

[

He et al.,

2017

]

. Direct regression usually performs boundary regres-

sion by predicting the offsets from a given point.

In this paper, we solve this problem from an instance-aware

segmentation perspective that mainly draws on the experience

of FCIS

[

Li et al., 2016

]

. Different from common object

detection, scene text often suffers from a large variance of

scale, aspect ratio, and orientation. Therefore, we design a

novel Inception-Text module to deal with these challenges.

This module is inspired by Inception module

[

Szegedy et al.,

2015

]

in GoogLeNet, we choose multi branches of different

convolution kernels to deal with the text of different aspect

ratios and scales. At the end of each branch, we add a de-

formable convolution layer to adapt multi orientations. An-

other improvement is that we replace the PSROI pooling in

FCIS with deformable PSROI pooling

[

Dai et al., 2017a

]

According to our experiments, deformable PSROI pooling

has better performance in the classiﬁcation task.

Our main contributions can be summarized as follows:

• We propose a new Inception-Text module for multi-

oriented scene text detection. According to our exper-

iments, this module shows a signiﬁcant increase in ac-

curacy with little computation cost.

• We propose to use deformable PSROI pooling module

to deal with multi-oriented text. The qualitative study

of learned offset parts in deformable PSROI pooling and

quantitive evaluations show its efﬁciency to handle arbi-

trary oriented scene text.

• We evaluate our proposed method on three public

datasets ICDAR2015, RCTW-17 and MSRA-TD500,

and show that our proposed method achieves the state-

of-the-art performance on several benchmarks without

using any extra data.

arXiv:1805.01167v2 [cs.CV] 8 May 2018

下载后可阅读完整内容，剩余6页未读，立即下载

Lilith_99

粉丝: 86
资源: 3

IncepText: 结构优化的多方向场景文本检测新模块

实用代码 09 Inception-v3图片分类

inception-master.zip

face-recognition-server:使用Inception-ResNet-v1 FaceNet模型的人脸识别API

transfer_learning_tutorial：使用Inception-resnet-v2进行学习转移的指南

keras-inception-resnet-v2:使用Keras的Inception-ResNet v2模型（带有权重文件）

完整工程案例：图像描述---Show and Tell: A Neural Image Caption Generator

Inception-Module-Tensorflow:Inception模块的Tensorflow实现

6-机器学习系列（6）：计算机视觉--ResNets、Inception原理及ResNets的Python实现1

计算机视觉（3）：用inception-v3模型重新训练自己的数据模型 计算机视觉.pdf

tensorflow-lite-android-inceptionv3:tensorflow-lite-android-inceptionv3

最新资源

计算机视觉（3）：用inception-v3模型重新训练自己的数据模型计算机视觉.pdf