The data set contains 3,425 videos of 1,595 different people. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and
the average length of a video clip is 181.3 frames.
The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part
locations, and the associated gesture to be recognized by the system.
This dataset contains 250 pedestrian image pairs + 775 additional images captured in a busy underground station for the research on
person re-identification.
Face tracks, features and shot boundaries from our latest CVPR 2013 paper. It is obtained from 6 episodes of Buffy the Vampire Slayer and
6 episodes of Big Bang Theory.
ChokePoint is a video dataset designed for experiments in person identification/verification under real-world surveillance conditions. The
dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2.
Tracking
Walking pedestrians in busy scenarios from a bird eye view
Three pedestrian crossing sequences
The set was recorded in Zurich, using a pair of cameras mounted on a mobile platform. It contains 12'298 annotated pedestrians in roughly
2'000 frames.
BMP image sequences.
Data sets for tracking and in aerial image sequences.
MIT traffic data set is for research on activity analysis and crowded scenes. It includes a traffic video sequence of 90 minutes long. It is
recorded by a stationary camera.
Segmentation
Ground truth database of 50 images with: Data, Segmentation, Labelling - Lasso, Labelling - Rectangle
Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets
Cows for object segmentation, Five video sequences for motion segmentation
Geometric Context Dataset: pixel labels for seven geometric classes for 300 images
This dataset contains videos of crowds and other high density moving objects. The videos are collected mainly from the BBC Motion Gallery
and Getty Images website. The videos are shared only for the research purposes. Please consult the terms and conditions of use of these
videos from the respective websites.
Contains hand-labelled pixel annotations for 38 groups of images, each group containing a common foreground. Approximately 17 images
per group, 643 images total.
200 gray level images along with ground truth segmentations
Image segmentation and boundary detection. Grayscale and color segmentations for 300 images, the images are divided into a training set
of 200 images, and a test set of 100 images.
328 side-view color images of horses that were manually segmented. The images were randomly collected from the WWW.
10 videos as inputs, and segmented image sequences as ground-truth
Foreground/Background
For evaluating background modelling algorithms
Foreground/Background segmentation and Stereo dataset from Microsoft Cambridge
The SABS (Stuttgart Artificial Background Subtraction) dataset is an artificial dataset for pixel-wise evaluation of background models.
Saliency Detection ()
120 Images / 20 Observers (Neil D. B. Bruce and John K. Tsotsos 2005).
27 Images / 40 Observers (O. Le Meur, P. Le Callet, D. Barba and D. Thoreau 2006).
100 Images / 31 Observers (Kootstra, G., Nederveen, A. and de Boer, B. 2008).
101 Images / 29 Observers (van der Linde, I., Rajashekar, U., Bovik, A.C., Cormack, L.K. 2009).
912 Images / 14 Observers (Krista A. Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba and Aude Oliva 2009).
758 Images / 75 Observers (R. Subramanian, H. Katti, N. Sebe1, M. Kankanhalli and T-S. Chua 2010).
235 Images / 19 Observers (Jian Li, Martin D. Levine, Xiangjing An and Hangen He 2011).