COCO-Text数据集：推动文本检测与识别的基准

需积分: 27 85 浏览量更新于2024-09-07 收藏 5.76MB PDF 举报

"COCO OCR数据集是一个专门用于文本检测和识别的大型数据集，源自MSCOCO数据集，旨在推动自然图像中的文本检测和识别技术的发展。它包含各种复杂日常场景的图片，并且对图像中的文本进行了多维度的标注，包括边界框定位、机器印刷文本与手写文本分类、可读与不可读文本分类以及文字的脚本类型等。" COCO OCR数据集是计算机视觉领域的一个重要资源，尤其对于光学字符识别（OCR）技术的研究和开发有着显著的促进作用。该数据集的创建者们来自康奈尔大学、康奈尔科技以及捷克科技大学，他们注意到像SUN和ImageNet这样的大规模数据集在推动场景理解和对象识别的进步方面起到了关键作用。COCO-Text的目标是进一步提升自然图像中的文本检测和识别的准确性和效率。 COCO-Text的数据基础是MSCOCO数据集，一个包含复杂日常场景图像的集合，这些图像并不是专门为文本而收集的，因此它们包含了各种各样的文本实例，这使得COCO-Text具有广泛的多样性和真实性。在标注上，COCO-Text提供了四个主要的标注信息： 1. **位置标注**：通过边界框来精确地标注出图像中每个文本实例的位置，这对于文本检测算法的训练至关重要。 2. **文本类型分类**：将文本分为机器印刷和手写两种类别，以适应不同环境和应用场景下的文本识别需求。 3. **可读性分类**：区分出图像中的文本是可读还是不可读，这一特性有助于识别算法专注于可理解的文本，避免无效计算。 4. **脚本类型**：标注文本的脚本，如拉丁文、汉字、阿拉伯文等，这有助于支持多语言的OCR系统开发。这个数据集的创建不仅为学术研究提供了丰富的素材，也对实际应用中的OCR技术改进提供了基础。研究人员可以利用COCO-Text进行算法的训练和测试，以提高在自然图像中的文本检测和识别的准确性和鲁棒性。同时，由于其多样性和全面性，COCO-Text也是评估新算法性能的理想基准，推动了整个OCR领域的进步。

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in

Natural Images

Andreas Veit

1,2

, Tom

s Matera

, Luk

s Neumann

, Ji

ı Matas

, Serge Belongie

1,2

Department of Computer Science, Cornell University

Cornell Tech

Department of Cybernetics, Czech Technical University, Prague

{av443,sjb344}@cornell.edu,

tomas@matera.cz,

{neumalu1,matas}@cmp.felk.cvut.cz

Abstract

This paper describes the COCO-Text dataset. In recent

years large-scale datasets like SUN and Imagenet drove the

advancement of scene understanding and object recogni-

tion. The goal of COCO-Text is to advance state-of-the-art

in text detection and recognition in natural images. The

dataset is based on the MS COCO dataset, which contains

images of complex everyday scenes. The images were not

collected with text in mind and thus contain a broad vari-

ety of text instances. To reﬂect the diversity of text in natu-

ral scenes, we annotate text with (a) location in terms of a

bounding box, (b) ﬁne-grained classiﬁcation into machine

printed text and handwritten text, (c) classiﬁcation into leg-

ible and illegible text, (d) script of the text and (e) tran-

scriptions of legible text. The dataset contains over 173k

text annotations in over 63k images. We provide a statis-

tical analysis of the accuracy of our annotations. In addi-

tion, we present an analysis of three leading state-of-the-art

photo Optical Character Recognition (OCR) approaches on

our dataset. While scene text detection and recognition en-

joys strong advances in recent years, we identify signiﬁcant

shortcomings motivating future work.

1. Introduction

The detection and recognition of scene text in natural

images with unconstrained environments remains a chal-

lenging problem in computer vision. The ability to ro-

bustly read text in unconstrained scenes can signiﬁcantly

help with numerous real-world application, e.g., assistive

technology for the visually impaired, robot navigation and

geo-localization. The problem of detecting and recogniz-

ing text in scenes has received increasing attention from the

computer vision community in recent years [1, 6, 14, 19].

The evaluation is typically done on datasets comprising im-

ages with mostly iconic text and containing hundreds of im-

ages at best.

a group of people point towards a green bus.

MS COCO image and annotations

COCO-Text annotations

1) 'mins',

legible

machine printed

English

2) 'Transport'

legible

machine printed

English

OCR results:

CORRECT

1) 'sweet potato',

legible

handwritten

English

2) '2.20'

legible

handwritten

English

3) '1.50'

legible

handwritten

English

...

OCR results:

INCORRECT

huge barrels of vegetables with people near them.

1) ‘mins’

1) ‘sweet potato’

2) ‘2.20’

3) ‘1.50’

2) ‘Transport’

Figure 1. Left: Example MS COCO images with object segmen-

tation and captions. Right: COCO-Text annotations. For the top

image, the photo OCR ﬁnds and recognizes the text printed on the

bus. For the bottom image, the OCR does not recognize the hand-

written price tags on the fruit stand.

To advance the understanding of text in unconstrained

scenes we present a new large-scale dataset for text in nat-

ural images. The dataset is based on the Microsoft COCO

dataset [10] that annotates common objects in their natural

contexts. Combining rich text annotations and object an-

notations in natural images provides a great opportunity for

research in scene text detection and recognition. MS COCO

was not collected with text in mind and is thus potentially

less biased towards text. Further, combining text with object

annotations allows for contextual reasoning about scene text

and objects. During a pilot study of state-of-the-art photo

OCR methods on MS COCO, we made two key observa-

tions: First, text in natural scenes is very diverse ranging

arXiv:1601.07140v2 [cs.CV] 19 Jun 2016

下载后可阅读完整内容，剩余7页未读，立即下载

zhouxiangyong88

粉丝: 2
资源: 110

COCO-Text数据集：推动文本检测与识别的基准

Synthetic_Chinese_String_Dataset 中文识别数据集 的label 和ctpn的训练集

TextGenerator:OCR数据集文本检测数据集字体分类数据集生成器

BankCardOCR:基于CTPN和CRNN实现的银行卡号识别系统

OCR_DataSet:收集并整理有关OCR的数据集并统一标注格式，刹车实验需要

ICDAR2017数据集说明

多模态+大模型+LLaVA1.5数据集

骑手头盔检测 车牌识别数据集 JPG（120张图像）

COCO-TEXT.rar

static_hrnet18s_ocr48_cocolvis

CoCo一键截图转文字识别器v1.0.0

最新资源

Synthetic_Chinese_String_Dataset 中文识别数据集的label 和ctpn的训练集

骑手头盔检测车牌识别数据集 JPG（120张图像）