探索全面的图像数据集：分类、跟踪、分割与检测资源指南

116 浏览量更新于2024-08-03 收藏 101KB PDF 举报

在计算机视觉和机器学习的研究中，图像数据集起着至关重要的作用，它们是训练和评估模型的基础。本文将介绍五个常用的图像数据集，涵盖不同的任务需求，如图像分类、对象跟踪、图像分割和目标检测。首先，搜狗实验室数据集是一个大型互联网图片库，源自Sogou搜索引擎索引的部分数据，包含人物、动物、建筑、机械、风景和运动等多个类别，总计超过280万张图片。这个数据集不仅提供了原始图片和缩略图，还附有相关网页和网页中的文本信息，规模庞大，达到了200多GB，非常适合用于训练图像检索和理解模型。 IMAGECLEF是一个专注于位图图像领域的基准数据集，每年举办比赛，涵盖检索、分类和标注等多种任务。这个系列的数据集在跨模态多媒体检索研究中具有重要地位，通过每年的比赛推动了相关技术的发展。 Xiaorong Li的图像数据集集合了350万社交标签图片，用于视频和图像检索研究。它包含地面真实标签，支持基于标签的社交图像检索，并且为20个PASCAL VOC概念提供了负样本，有助于提升模型的区分能力。维基百科特色文章的图片（包括特征）及其对应的文本信息，为跨模态多媒体检索提供了独特资源。ANewApproachtoCross-ModalMultimediaRetrieval 和 OntheRoleofCorrelationandAbstractioninCross-ModalMultimediaRetrieval 是研究这一领域的论文，但具体下载链接可能需要自行查找。最后，提到的一个大型真实世界网络图像数据集拥有超过26.9万张图片，用户提供了5000多个标签，并且为整个数据集的81个概念提供了详细的地面真相信息。相比于Coreland Caltech 101等常见数据集，它的规模更大，更贴近实际应用场景，适用于训练和评估大规模图像识别和概念理解模型。总结来说，这些数据集为不同类型的计算机视觉任务提供了丰富的资源，无论是基础的图像分类还是复杂的多模态检索，都能找到适合的研究素材。使用这些数据集时，需要注意版权问题，并根据研究需求进行适当预处理和标注。在开发和优化模型时，结合这些数据集的多样性能够显著提高模型的泛化能力和实际应用价值。

The data set contains 3,425 videos of 1,595 different people. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and

the average length of a video clip is 181.3 frames.

The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part

locations, and the associated gesture to be recognized by the system.

This dataset contains 250 pedestrian image pairs + 775 additional images captured in a busy underground station for the research on

person re-identification.

Face tracks, features and shot boundaries from our latest CVPR 2013 paper. It is obtained from 6 episodes of Buffy the Vampire Slayer and

6 episodes of Big Bang Theory.

ChokePoint is a video dataset designed for experiments in person identification/verification under real-world surveillance conditions. The

dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2.

Tracking

Walking pedestrians in busy scenarios from a bird eye view

Three pedestrian crossing sequences

The set was recorded in Zurich, using a pair of cameras mounted on a mobile platform. It contains 12'298 annotated pedestrians in roughly

2'000 frames.

BMP image sequences.

Data sets for tracking and in aerial image sequences.

MIT traffic data set is for research on activity analysis and crowded scenes. It includes a traffic video sequence of 90 minutes long. It is

recorded by a stationary camera.

Segmentation

Ground truth database of 50 images with: Data, Segmentation, Labelling - Lasso, Labelling - Rectangle

Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets

Cows for object segmentation, Five video sequences for motion segmentation

Geometric Context Dataset: pixel labels for seven geometric classes for 300 images

This dataset contains videos of crowds and other high density moving objects. The videos are collected mainly from the BBC Motion Gallery

and Getty Images website. The videos are shared only for the research purposes. Please consult the terms and conditions of use of these

videos from the respective websites.

Contains hand-labelled pixel annotations for 38 groups of images, each group containing a common foreground. Approximately 17 images

per group, 643 images total.

200 gray level images along with ground truth segmentations

Image segmentation and boundary detection. Grayscale and color segmentations for 300 images, the images are divided into a training set

of 200 images, and a test set of 100 images.

328 side-view color images of horses that were manually segmented. The images were randomly collected from the WWW.

10 videos as inputs, and segmented image sequences as ground-truth

Foreground/Background

For evaluating background modelling algorithms

Foreground/Background segmentation and Stereo dataset from Microsoft Cambridge

The SABS (Stuttgart Artificial Background Subtraction) dataset is an artificial dataset for pixel-wise evaluation of background models.

Saliency Detection ()

120 Images / 20 Observers (Neil D. B. Bruce and John K. Tsotsos 2005).

27 Images / 40 Observers (O. Le Meur, P. Le Callet, D. Barba and D. Thoreau 2006).

100 Images / 31 Observers (Kootstra, G., Nederveen, A. and de Boer, B. 2008).

101 Images / 29 Observers (van der Linde, I., Rajashekar, U., Bovik, A.C., Cormack, L.K. 2009).

912 Images / 14 Observers (Krista A. Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba and Aude Oliva 2009).

758 Images / 75 Observers (R. Subramanian, H. Katti, N. Sebe1, M. Kankanhalli and T-S. Chua 2010).

235 Images / 19 Observers (Jian Li, Martin D. Levine, Xiangjing An and Hangen He 2011).

剩余10页未读，继续阅读

sun7bear

粉丝: 1
资源: 121

探索全面的图像数据集：分类、跟踪、分割与检测资源指南

数字图像处理标准测试数据集文献常用

人工智能目标检测数据集（飞机卫星图2）

深度学习图像数据集：GTA 车辆方向数据集

深度学习进行跟踪和检测：收集论文，数据集，代码和其他资源，以使用深度学习进行对象跟踪和检测

多种图像边缘检测与分割处理

行人目标检测数据集介绍：51MB的深度学习图像数据集

ECSSD数据集：用于显著性目标检测的图像资源

YOLO格式垃圾分类数据集持续发展跟踪：跟踪数据集更新和模型改进

【图像分割中的对象检测与跟踪】：核心技术与策略

YOLO算法与图像分割：目标检测与图像分割技术的交叉点探索

最新资源