TextSnake：检测任意形状文本的新方法

深度学习

图像识别

需积分: 13 200 浏览量更新于2024-07-15 收藏 3.72MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

“TextSnake - A Flexible Representation for Detecting Text of Arbitrary Shapes.pdf” 近年来，随着深度学习技术的发展和大规模数据集的出现，场景文本检测与识别领域已经取得了显著的进步，不断刷新各种标准基准测试的性能记录。然而，现有的方法主要受限于它们对文本的描述方式，如轴对齐矩形、旋转矩形或四边形，这使得处理更自由形式的文本实例（如曲线文本）时力有未逮，而这类文本在现实世界中非常常见。为了解决这个问题，论文提出了“TextSnake”这一更灵活的场景文本表示方法，能够有效地表示水平、定向以及曲线形式的文本实例。在TextSnake中，一个文本实例被描述为一系列有序且相互重叠的圆盘，这些圆盘的中心位于字符的关键点上。每个圆盘代表了文本轮廓的一部分，通过组合这些圆盘，可以构建出任意形状的文本实例。这种方法的优势在于，它能够更精确地捕捉到文本实例的形状变化，尤其是对于弯曲或扭曲的文本，相比于传统的矩形框，提供了更高的描述精度。 TextSnake的实现基于深度学习框架，通常包括卷积神经网络（CNN）用于特征提取，以及序列模型（如循环神经网络RNN或Transformer）来处理和理解这些圆盘的顺序信息。训练过程中，采用了监督学习策略，通过标注的文本实例数据来调整网络参数，使其能够学习到如何生成准确的圆盘序列以表示文本。此外，TextSnake还引入了一种有效的检测算法，该算法能够在图像中定位并识别出各种形状的文本。它首先进行候选区域生成，然后对每个候选区域应用TextSnake表示，并通过非极大值抑制（NMS）来消除重复的检测结果。这种检测流程确保了对复杂场景中的文本具有良好的鲁棒性和准确性。 TextSnake是一种创新的文本检测方法，它提高了对任意形状文本的检测能力，特别适用于实际场景中的自由形态文本，从而推动了场景文本检测技术的发展。这种方法不仅在学术界引起了广泛关注，也为实际应用，如自动驾驶、图像理解和智能监控等领域提供了强大的工具。

资源详情

资源推荐

4 Shangbang Long et al.

3 Methodology

In this section, we ﬁrst introduce the new representation for text of arbitrary

shapes. Then we describe our method and training details.

3.1 Representation

text region

text center line

disk

Fig. 2. Illustration of the proposed TextSnake representation. Text region (in yellow) is

represented as a series of ordered disks (in blue), each of which is located at the center

line (in green, a.k.a symmetric axis or skeleton) and associated with a radius r and an

orientation θ. In contrast to conventional representations (e.g., axis-aligned rectangles,

rotated rectangles and quadrangles), TextSnake is more ﬂexible and general, since it

can precisely describe text of diﬀerent forms, regardless of shapes and lengths.

As shown in Fig. 1, conventional representations for scene text (e.g., axis-

aligned rectangles, rotated rectangles and quadrangles) fail to precisely describe

the geometric properties of text instances of irregular shapes, since they generally

assume that text instances are roughly in linear forms, which does not hold true

for curved text. To address this problem, we propose a ﬂexible and general rep-

resentation: TextSnake. As demonstrated in Fig. 2, TextSnake expresses a text

instance as a sequence of overlapping disks, each of which is located at the center

line and associated with a radius and an orientation. Intuitively, TextSnake is

able to change its shape to adapt for the variations of text instances, such as

rotation, scaling and bending.

Mathematically, a text instance t, consisting of several characters, can be

viewed as an ordered list S(t). S(t) = {D

, D

, · · · , D

}, where D

stands for the ith disk and n is the number of the disks. Each disk D is as-

sociated with a group of geometry attributes, i.e. D = (c, r, θ), in which c, r and

θ are the center, radius and orientation of disk D, respectively. The radius r is

deﬁned as half of the local width of t, while the orientation θ is the tangential

剩余16页未读，继续阅读

艾尔_1222

粉丝: 788
资源: 10

TextSnake：检测任意形状文本的新方法

TextSnake.pytorch:ECCV2018论文的PyTorch实施

Graph-based Knowledge Representation: Computational Foundations of Conceptual Graphs

javax.swing.text.Position;

Log Gabor相关资料，列举出来

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

st_geomfromtext('${pointWkt}', 4326)

cNN autoencoder

on object of type org.gradle.api.internal.artifacts.repositories.DefaultMavenArtifactRepository

U-Shaped Transformer for Image Restoration

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead - the MultiLabelBinarizer transformer can convert to this format. Could not load symbol

"""Preprocessing of adjacency matrix for simple GCN model and conversion to tuple representation."""

error [string "------------------------------------ ..."]:106: number (global 'SysOPwr') has no integer representation

最新资源