彩色图像中文本检测与识别的深度调研与挑战

需积分: 10 182 浏览量更新于2024-07-21 收藏 1.12MB PDF 举报

文本检测与识别在图像中的技术挑战、方法及其性能研究是信息技术领域的一个关键课题。本文档，由Qixiang Ye和David Doermann撰写，作为一篇综述论文，着重分析了彩色图像中文本检测和识别所面临的难题，并对其现有技术进行了深入探讨。首先，作者明确了问题的核心，即在彩色图像中定位、验证、分割和识别文本。他们区分了两种主要的技术路径：分步骤（stepwise）方法，这种方法通常逐个处理每个子任务，如先进行边缘检测，然后定位候选区域，再进一步进行字符识别；以及集成（integrated）方法，试图通过端到端的方式解决整个问题，减少中间环节。文本检测部分面临的挑战包括如何在复杂的背景和光照条件下准确识别文本边界，尤其是在低对比度或模糊的场景下。此外，多方向、透视变形以及多语言文本的处理也是关键技术难题。为了提升文本的可读性，增强处理技术和视频文本分析是研究的重点。文本本地化涉及精确地定位文本行或字符的位置，这可能需要考虑字体大小、形状和倾斜角度的变化。文本验证则涉及到确认检测出的文本是否真的包含有意义的文字，而不是噪声或图像的一部分。分割则是将连续的文本字符分开，以便单独处理。在识别阶段，识别模型需要对抗各种字体、字符集和布局的多样性。传统的OCR（Optical Character Recognition）技术，如基于模板匹配或机器学习的方法，以及现代的深度学习模型，如卷积神经网络（CNN）和循环神经网络（RNN），都在不断提高识别精度。文章列举了多个基准数据集，如IAM手写体数据库、ICDAR竞赛数据集等，用以评估和比较不同方法的性能。通过对最具代表性的方法进行对比，论文旨在提供一个全面的框架，来理解和改进当前领域的不足之处。总结来说，这篇综述深入剖析了文本检测与识别在彩色图像中的技术现状，讨论了关键的子问题和解决方案，同时为未来的研究方向提供了有价值的参考。它对于那些致力于在这个领域创新的科研人员和工程师来说，是一份宝贵的资源。

By contrast, the goal of integrated methodologies is to

identify speciﬁc words in imagery with character and lan-

guage models. Integrated methodologies can avoid the chal-

lenging segmentation step or optimize it with character and

word recognition, which makes it less sensitive to complex

backgrounds and low resolution text. The disadvantage lies

in that the multi-class character classiﬁcation procedure is

computationally expensive when considering a large char-

acter class number and a large amount of candidate win-

dows. In addition, the increase of word class number could

signiﬁcantly decrease the detection and recognition perfor-

mance, so the generality is often limited to a small lexicon of

words.

4FUNDAMENTAL SUB-PROBLEMS

In this section, sub-problems including text localization,

veriﬁcation, segmentation, and recognition are described.

Each approach is reviewed with respect to its primary con-

tribution. The approaches that make multiple contributions

are analyzed with respect to each contribution.

4.1 Text Localization

The objective of text localization is to localize text compo-

nents precisely as well as to group them into candidate text

regions with as little background as possible. For text locali-

zation, connected component analysis (CCA) and sliding

window classiﬁcation are two widely used methods, and

color, edges, strokes, and texture are typically used as

features.

4.1.1 Methods

Connected component analysis. CCA could be regarded as a

graph algorithm, where subsets of connected components

are uniquely labeled based on heuristics about feature con-

sensus, i.e., color similarity and spatial layout. In implemen-

tations of CCA, syntactic pattern recognition methods are

often used to analyze the spatial and feature consensus, and

to deﬁne text regions. Considering the complexity of ﬁne-

turning the syntactic rules, a new trend is to perform CCA

with statistical models [109], [138], [182], e.g., using an Ada-

Boost classiﬁer on pairwise spatial features to learn the

CCA models [182]. The use of statistical models in CCA sig-

niﬁcantly improves its adaptivity.

Sliding window classiﬁcation. In the sliding window classi-

ﬁcation method, multi-scale image windows that are classi-

ﬁed into positives are further grouped into text regions with

morphological operations [130], CRF [148] or graph meth-

ods [123], [173]. The advantage of this method lies in the

simple and adaptive training-detection architecture. Never-

theless, it is often computationally expensive when complex

classiﬁcation methods are used and a large number of win-

dows need to be classiﬁed.

4.1.2 Features

For text localization, color [174], edge [28] and texture fea-

tures [19] were conventionally used, and stroke [47], [107],

[163], point [152], region [137], [138], [150], [164], [182] and

character appearance features [94], [196], [198], [199] have

recently been explored.

Color features. Text is often produced in a consistent and

distinguishable color so that it contrasts with the back-

ground [40]. Under this assumption, color features could be

used to localize text [2], [22], [54], [63], [82], [92], [96], [109],

[150]. As a 20-year old method, color-based text localization

operates often simply and efﬁciently, although it is sensitive

to multi-color characters and uneven lighting, which can

seriously degrade color features.

An early color-based text localization approach is from

Jain and Yu [2]. They used color reduction to generate color

layers, a clustering algorithm to obtain CCs, and connected

CCs into text candidates with color similarity and compo-

nent layout analysis. In other work [95], it was shown that

the use of a mean-shift algorithm to generate color layers

could improve the robustness to complex backgrounds.

To be adaptive to color variation, color features are

extracted in converted or combined color spaces or

described with mixture models [27], [74], [76], [174]. In [7],

Garcia and Apostolidis performed text extraction with a k-

means clustering algorithm in the hue-saturation-value

(HSV) color space. Karatzas and Antonacopoulos [33]

extracted text components with a split-and-merge strategy

in the hue-lightness-saturation (HLS) color space. Chen

et al. [26] proposed using Gaussian mixture models in R, G,

B, hue and intensity channels to localize text.

Edge/Gradient features. The family of edge/gradient-based

approaches assumes that text exhibits a strong and symmet-

ric gradient against its background. Thus, those pixels with

large and symmetric gradient values could be regarded as

text components. In [4], [12], [23], [27], [80], [114], [177],

[181] edge features are used to detect text components, and

in [12], [24], [71], [98] gradient features are used.

Wu et al. [4] proposed using Gaussian derivatives to

extract horizontally aligned vertical edges, which are aggre-

gated to produce chips corresponding to text strings if

”short paths” exist between edge pairs. In recent work

[167], Phan et al. proposed grouping horizontally aligned

components of ”gradient vector ﬂow” into text candidates

based on spatial constraints of sizes, positions and color

distances.

Compared with color features, gradient/edge features

are less sensitive to uneven lighting and multi-color charac-

ters [9]. They are combined with such classiﬁers as artiﬁcial

neural networks [14], [16] or Adaboost [28], [68] to perform

sliding window based text localization. However, they often

have difﬁculty when discriminating text components with

complex backgrounds having a strong gradient.

Texture features. When characters are dense, text could be

considered as a texture [29]. Texture features including

Fourier Transform [116], Discrete Cosine Transform (DCT)

[8], Wavelet [5], [49], LBP, and HOG [113] have been used

to localize text. Such features are usually combined with a

multi-scale sliding window classiﬁcation method to per-

form text localization. Texture features are effective for

detecting dense characters, although they might not detect

sparse characters, i.e., signs in scene images which lack sig-

niﬁcant texture properties.

Li et al. pioneered the text localization method with

Wavelet texture features [5]. They proposed using mean,

second and third order central moments of wavelet coefﬁ-

cients and a neural network to classify image windows, of

1484 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 7, JULY 2015

剩余20页未读，继续阅读

matlabr2009

粉丝: 0
资源: 2

彩色图像中文本检测与识别的深度调研与挑战

iOS平台TextDetection源码分析：仅识别文字区域

"基于改进YOLOv3的绝缘子串定位与状态识别方法

"保护品牌商誉：基于内容的图像检索CBIR技术研究"。

Scene text detection and recognition with advances in deep learning.pdf

Review of Scene Text Detection and Recognition

Scene text detection and recognition_ recent advances and future trends.pdf

Scene Text Detection and Recognition_ The Deep Learning Era.pdf

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Object Detection and Recognition in Digital Images Theory and Practice-PPTs

2D Object Detection and Recognition

最新资源