分层多特征学习的英文文本定位方法

120 浏览量更新于2024-08-29 收藏 506KB PDF 举报

"这篇研究论文探讨了自然场景图像中的英文文本定位问题，提出了一种基于分层多特征学习的文本定位框架。该框架从字符到字符串再到单词，逐级进行定位，旨在设计简单但有效的特征和学习模型，不同于依赖复杂手工特征或大型学习模型的现有方法。在字符定位中，结合了梯度直方图（HOG）和卷积神经网络（CNN）特征，并引入了一种两级字符结构特征。在字符串定位阶段，提出了一个九维字符串特征，用于区分验证。" 在本文中，作者主要关注的是自然场景图像中的文本定位任务，这是一个在计算机视觉领域中的挑战性问题。传统的文本检测方法通常依赖于精细设计的手工特征，例如边缘、纹理和形状等，或者使用复杂的深度学习模型。然而，这些方法可能存在计算成本高、泛化能力弱等问题。为了克服这些挑战，作者提出了一个层次化的文本定位框架。这个框架分为三个层次：字符定位、字符串定位和单词定位。首先，他们利用二级字符结构特征，这些特征能够捕捉字符的局部和全局信息。二级结构意味着特征不仅考虑单个字符，还考虑相邻字符之间的关系，这对于识别和定位连续的字符序列至关重要。同时，他们结合了HOG和CNN特征，这两种特征在计算机视觉中已经证明了其在物体识别和定位上的有效性。HOG特征可以捕获图像的局部形状信息，而CNN特征则可以从深层学习中提取更抽象和高级的表示。在字符串定位阶段，作者提出了一种九维字符串特征，这种特征是专为区分验证设计的，目的是提高对文本字符串的识别准确性。这种特征可能包括字符串的长度、方向、空间分布等信息，有助于区分不同文本实例。此外，论文中提到的方法倾向于使用简单但高效的特征和学习模型，这可能是为了降低计算复杂性，提高实时性能，以及增强模型在未见过的数据上的泛化能力。通过这种方法，作者可能期望在保持良好性能的同时，减少对大量标注数据和计算资源的依赖。这篇研究论文为文本本地化提供了一个创新的解决方案，它结合了多种特征和层次化学习策略，以解决自然场景图像中的文本检测问题。这种方法有望在实际应用中提高文本检测的准确性和效率。

adfa, p. 1, 2011.

Text Localization with Hierarchical

Multiple Feature Learning

Yanyun Qu

, Li Lin

, Weiming Liao

, Junran Liu

, Yang Wu

, Hanzi Wang

Computer Science Department, Xiamen University, Xiamen, China

{quyanyun, linlipj, liaoweimin0909, ilevanaliu, wang.hz}@gmail.com

Center for Frontier Science and Technology, Nara Institute of Science Technology, Nara, Japan

wuyang0321@gmail.com

Abstract. In this paper, we focus on English text localization in natural scene

images. We propose a hierarchical localization framework which goes from

characters to strings to words. Different from existing methods which either bet

on sophisticated hand-crafted features or rely on heavy learning models, our

approach tends to design simple but effective features and learning models. In

this study, we introduce a kind of two level character structure features in colla-

boration with the Histogram of Gradient (HOG) and the Convolutional Neural

Network (CNN) features for character localization. In string localization, a

nine-dimension string feature is proposed for discriminative verification after

grouping characters. For the final word localization, we learn an optimal split-

ting strategy based on the interval cues to split strings into words. Experiments

on the challenging ICDAR benchmark datasets demonstrate the effectiveness

and superiority of our approach.

Keywords: Hierarchical framework, character structure feature, string feature,

Convolutional Neural Network, text localization

1 Introduction

With the development of multimedia technology and the popularity of digital imaging

devices (such as digital cameras), vast amounts of natural scene images, which carry a

wealth of information, are collected and stored. Among all the information contained

in an image, text, as a kind of strong high-level semantic resource, provides valuable

cues about image content. Actually, text is very important for humans and computers

to understand the scenes. Judd[1]proved that people, given an image, tend to fixate on

text more than other objects, which suggests the importance of text to humans. Text

recognition is also critical in intelligent navigation, movie summation, vision assis-

tance systems, etc. As a result, there is an urgent need to develop the technology of

text recognition in natural scene images.

Text recognition is usually divided into two tasks: text localization and word recogni-

tion. Text localization is an important prerequisite for word recognition. Text localiza-

tion, as an important task among the renowned competitions held in the International

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38723699

粉丝: 6
资源: 871

分层多特征学习的英文文本定位方法

Delphi DevExpress控件 TcxGrid中文本地化（附用法）

深度学习算法教程(Deeplearning Algorithms Tutorial) 完整版PDF

毕设：基于PyQT+朴素贝叶斯的文本分类算法的文本分类系统，该系统具有qt桌面端和web端.zip

Java Web JSP标准标签实战：国际化与本地化示例

分层多线索建模：预测异构游客信息下的POI人气

ESRI Flex Viewer框架下的ArcGIS Server本地化开发

【破解Django本地化难题：深入美国表单字段的本地化实战指南】：一步到位的高级技巧

Python utils库的国际化和本地化支持：多语言应用开发

【lxml.etree的多语言支持】：国际化与本地化XML数据

【WinForms国际化与本地化】：打造支持多语言的桌面应用

最新资源