R2CNN：旋转区域CNN用于鲁棒场景文本检测

需积分: 9 35 浏览量更新于2024-09-07 收藏 642KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"R2CNN: 旋转区域卷积神经网络用于角度鲁棒场景文本检测" 在文本检测领域，特别是自然场景图像中的文本检测，R2CNN（Rotational Region CNN）是一种创新方法，它针对任意方向的文本进行检测。该框架构建在Faster R-CNN的基础上，旨在提高对不同角度文本的识别能力。Faster R-CNN是一种经典的物体检测框架，它包含了Region Proposal Network（RPN）和后续的检测网络两个主要部分。 R2CNN的核心在于其处理文本方向的创新方式。首先，RPN被用来生成轴对齐的边界框，这些边界框能够包围图像中各种角度的文本。RPN是Faster R-CNN中的关键组件，它通过滑动窗口和共享的卷积层快速提出可能包含目标的候选区域。接下来，对于RPN提出的每个轴对齐的文本框，R2CNN执行不同的池化操作，提取出相应的特征。这些不同大小的池化特征被拼接在一起，形成一个丰富的特征向量。这个向量不仅用于预测文本/非文本的概率，还会同时预测轴对齐的边界框以及倾斜的最小面积边界框。这种设计允许模型在单次前向传播中同时处理文本的存在、位置和方向信息，提高了检测效率。最后，R2CNN应用了一个倾斜的非极大值抑制算法来消除重复的检测结果，确保最终输出的检测框是唯一的且具有高置信度。这一步骤至关重要，因为它能减少误报并提升检测精度。 R2CNN在实际应用中展现出了优越的性能。在两个重要的文本检测基准测试上——ICDAR 2015和ICDAR 2013——R2CNN取得了竞争性的结果。这两个数据集都包含了大量的现实场景图像，涵盖了各种角度和复杂背景的文本，因此在这些数据集上的成功验证了R2CNN对复杂场景的鲁棒性。 R2CNN通过结合RPN和创新的特征处理策略，实现了对自然场景中任意角度文本的高效、精确检测。这种方法不仅在技术上具有重要意义，而且对于依赖场景文本理解的应用，如基于文本的检索、翻译等，具有广泛的实际价值。

资源详情

资源推荐

Abstract

In this paper, we propose a novel method called

Rotational Region CNN (R

CNN) for detecting

arbitrary-oriented texts in natural scene images. The

framework is based on Faster R-CNN [1] architecture.

First, we use the Region Proposal Network (RPN) to

generate axis-aligned bounding boxes that enclose the texts

with different orientations. Second, for each axis-aligned

text box proposed by RPN, we extract its pooled features

with different pooled sizes and the concatenated features

are used to simultaneously predict the text/non-text score,

axis-aligned box and inclined minimum area box. At last,

we use an inclined non-maximum suppression to get the

detection results. Our approach achieves competitive

results on text detection benchmarks: ICDAR 2015 and

ICDAR 2013.

1. Introduction

Texts in natural scenes (e.g., street nameplates, store

names, good names) play an important role in our daily life.

They carry essential information about the environment.

After understanding scene texts, they can be used in many

areas, such as text-based retrieval, translation, etc. There

are usually two key steps to understand scene texts: text

detection and text recognition. This paper focuses on scene

text detection. Scene text detection is challenging because

scene texts have different sizes, width-height aspect ratios,

font styles, lighting, perspective distortion, orientation, etc.

As the orientation information is useful for scene text

recognition and other tasks, scene text detection is different

from common object detection tasks that the text

orientation should be also be predicted in addition to the

axis-aligned bounding box information.

While most previous text detection methods are designed

for detecting horizontal or near-horizontal texts

[2,3,4,5,6,7,8,9,10,11,12,13,14], some methods try to

address the arbitrary-oriented text detection problem

[15,16,17,18,19,20,31,32,33,34]. Recently, arbitrary-

oriented scene text detection is a hot research area, which

can be seen from the frequent result updates in ICDAR2015

Robust Reading competition in incidental scene text

(a) (b)

Fig. 1. The procedure of the proposed method R

CNN. (a)

Original input image; (b) text regions (axis-aligned bounding

boxes) generated by RPN; (c) predicted axis-aligned boxes and

inclined minimum area boxes (each inclined box is associated

with an axis-aligned box, and the associated box pair is indicated

by the same color); (d) detection result after inclined

non-maximum suppression.

detection [21]. While traditional text detection methods are

based on sliding-window or Connected Components (CCs)

[2,3,4,6,10,13,17,18,19,20], deep learning based methods

have been widely studied recently [7,8,9,12,15,16,31,32,

33,34].

This paper presents a Rotational Region CNN (R

CNN)

for detecting arbitrary-oriented scene texts. It is based on

Faster R-CNN architecture [1]. Figure 1 shows the

procedure of the proposed method. Figure 1(a) is the

original input image. We first use the RPN to propose

axis-aligned bounding boxes that enclose the texts (Figure

1(b)). Then we classify the proposals, refine the

axis-aligned boxes and predict the inclined minimum area

boxes with pooled features of different pooled sizes (Figure

1(c)). At last, inclined non-maximum suppression is used to

post-process the detection candidates to get the final

detection results (Figure 1(d)). Our method yields an

F-measure of 82.54% on ICDAR 2015 incidental text

detection benchmark and 87.73% on ICDAR 2013 focused

text detection benchmark.

CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu and Zhenbo Luo

Samsung R&D Institute China - Beijing

{yy.jiang, xiangyu.zhu, x0106.wang, shuli.yang, wei2016.li, hua00.wang, pei.fu, zb.luo}@samsung.com

arXiv:1706.09579 v2 [cs.CV] 30 June 2017

下载后可阅读完整内容，剩余7页未读，立即下载

journeyend

粉丝: 6
资源: 3

R2CNN：旋转区域CNN用于鲁棒场景文本检测

包含了轮船，游船，渔船，帆船的普适类数据集

R2CNN Rotational Region CNN

R2CNN_Faster-RCNN_Tensorflow：基于Faster-RCNN的旋转区域检测

基于keras 的faster-rcnn 旋转目标检测算法

Capsule Networks for Computer Vision: A Survey翻译

使用rotational rose软件进行uml图绘制的步骤

python rotational matrix to euler angle

密码学中slide, rotational, selfsimilarity or similar attacks分别指什么

gb_eff_scale=1; gb_inertia=0; % (kg*m^2), gearbox rotational inertia measured at input; unknown % trq and speed scaling parameters gb_spd_scale=1; gb_trq_scale=1;

用simscape写一段代码，演示PID反馈，在发送照片时，请使用markdown，不要有反斜线，不用代码块，使用unsplashAPIhttps://source.unsplash.com?1080*720/?<关键词>

veh_version=2002; % version of ADVISOR for which the file was generated veh_proprietary=0; % 0=> non-proprietary, 1=> proprietary, do not distribute veh_validation=0; % 0=> no validation, 1=> data agrees with source data,

用FPGA实现DOA的源代码

ceph osd metadata

esprit算法的复杂度分析

最新资源