回顾Pascal VOC对象挑战：2008-2012年目标检测竞赛总结

需积分: 12 108 浏览量更新于2024-07-16 收藏 5.78MB PDF 举报

《Pascal视觉对象类别挑战：回顾与反思》（The Pascal Visual Object Classes Challenge – a Retrospective）是一篇发表在《计算机视觉国际期刊》上的文章，着重回顾了2008年至2012年间举行的目标检测PASCAL VOC（Visual Object Classes）系列挑战。该挑战由Mark Everingham、S.M.Ali Eslami、Luc Van Gool、Christopher K.I. Williams、John Winn和Andrew Zisserman等人组织，分为五个主要任务：图像分类、目标检测、分割、动作识别和人体布局。挑战的核心是提供一个公开的图像数据集，其中包括精确的标注信息以及标准化的评估工具，这使得研究人员能够衡量算法的性能并推动技术进步。PASCAL VOC数据集因其广泛的应用和多样化的场景而受到广泛关注，它包含了多个难度级别，从简单的物体识别到复杂的实例级目标检测，为算法设计者提供了丰富的测试平台。在这篇文章中，作者首先介绍了PASCAL VOC挑战的历史背景和目标，接着详细阐述了每年竞赛中的关键发现和技术趋势。对于算法设计者来说，文中深入剖析了各年度参赛队伍所展示的最佳实践和创新方法，以及这些方法如何提升模型在VOC数据集上的性能。同时，作者也揭示了当时技术的局限性和挑战，如小物体检测、复杂背景干扰、多目标跟踪等问题。对于挑战设计师而言，文章讨论了作为组织者的经验教训，包括数据集的更新策略、评估指标的优化、以及如何更好地促进学术界和工业界的交流与合作。此外，文章还给出了对未来挑战的建议，例如如何平衡公平性和技术创新，如何处理实时性问题，以及如何适应不断变化的计算机视觉领域需求。《Pascal视觉对象类别挑战：回顾与反思》不仅提供了对过去五年内计算机视觉领域重大进展的回顾，而且为未来类似挑战的设计提供了宝贵的参考，帮助研究者了解当前算法的优势和挑战，同时也为挑战的持续改进和创新指明了方向。

The Pascal Visual Object Classes Challenge – a Retrospective 7

2.4.3 Classiﬁcation and detection

Both the classiﬁcation and detection tasks were eval-

uated as a set of 20 independent two-class tasks: e.g.

for classiﬁcation “is there a car in the image?”, and for

detection “where are the cars in the image (if any)?”.

A separate ‘score’ is computed for each of the classes.

For the classiﬁcation task, participants submitted re-

sults in the form of a conﬁdence level for each image

and for each class, with larger values indicating greater

conﬁdence that the image contains the object of in-

terest. For the detection task, participants submitted

a bounding box for each detection, with a conﬁdence

level for each bounding box. The provision of a conﬁ-

dence level allows results to be ranked such that the

trade-oﬀ between false positives and false negatives can

be evaluated, without deﬁning arbitrary costs on each

type of classiﬁcation error.

In the case of classiﬁcation, the correctness of a class

prediction depends only on whether an image contains

an instance of that class or not. However, for detec-

tion a decision must be made on whether a prediction

is correct or not. To this end, detections were assigned

to ground truth objects and judged to be true or false

positives by measuring bounding box overlap. To be

considered a correct detection, the area of overlap a

between the predicted bounding box B

and ground

truth bounding box B

must exceed 50% by the for-

mula:

area(B

∩ B

)

area(B

∪ B

)

, (1)

where B

∩B

denotes the intersection of the predicted

and ground truth bounding boxes and B

∪ B

their

union.

Detections output by a method were assigned to

ground truth object annotations satisfying the overlap

criterion in order ranked by the (decreasing) conﬁdence

output. Ground truth objects with no matching detec-

tion are false negatives. Multiple detections of the same

object in an image were considered false detections, e.g.

5 detections of a single object counted as 1 correct de-

tection and 4 false detections – it was the responsibility

of the participant’s system to ﬁlter multiple detections

from its output.

For a given task and class, the precision-recall

curve is computed from a method’s ranked output.

Up until 2009 interpolated average precision (Salton

and Mcgill, 1986) was used to evaluate both classiﬁ-

cation and detection. However, from 2010 onwards the

method of computing AP changed to use all data points

rather than TREC-style sampling (which only sampled

the monotonically decreasing curve at a ﬁxed set of

uniformly-spaced recall values 0, 0.1, 0.2, ..., 1). The in-

tention in interpolating the precision-recall curve was

to reduce the impact of the ‘wiggles’ in the precision-

recall curve, caused by small variations in the ranking of

examples. However, the downside of this interpolation

was that the evaluation was too crude to discriminate

between the methods at low AP.

2.4.4 Segmentation

The segmentation challenge was assessed per class on

the intersection of the inferred segmentation and the

ground truth, divided by the union (commonly referred

to as the ‘intersection over union’ metric):

seg. accuracy =

true pos.

true pos. + false pos. + false neg.

(2)

Pixels marked ‘void’ in the ground truth (i.e. those

around the border of an object that are marked as nei-

ther an object class or background) are excluded from

this measure. Note, we did not evaluate at the individ-

ual object level, even though the data had annotation

that would have allowed this. Hence, the precision of

the segmentation between overlapping objects of the

same class was not assessed.

2.4.5 Action classiﬁcation

The task is assessed in a similar manner to classiﬁca-

tion. For each action class a score for that class should

be given for the person performing the action (indicated

by a bounding box or a point), so that the test data can

be ranked. The average precision is then computed for

each class.

2.4.6 Person layout

At test time the method must output the bounding

boxes of the parts (head, hands and feet) that are visi-

ble, together with a single real-valued conﬁdence of the

layout so that a precision/recall curve can be drawn.

From VOC 2010 onwards, person layout was evalu-

ated by how well each part individually could be pre-

dicted: for each of the part types (head, hands and feet)

a precision/recall curve was computed, using the conﬁ-

dence supplied with the person layout to determine the

ranking. A prediction of a part was considered true or

false according to the overlap test, as used in the detec-

tion challenge, i.e. for a true prediction the bounding

box of the part overlaps the ground truth by at least

50%. For each part type, the average precision was used

as the quantitative measure.

This method of evaluation was introduced following

criticism of an earlier evaluation used in 2008, that was

8 Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman

jumping

phoning

playing

reading

ridingbike

ridinghorse

running

takingphoto

using

walking

other

total

Img 405 444 459 463 400 411 310 414 395 386 799 4588

Obj 495 457 619 530 578 534 561 456 476 597 1043 6278

Table 4: Statistics of the action classiﬁcation

VOC2012 dataset. For the trainval dataset, the number

of images containing at least one person performing a given

action, and the corresponding number of objects are shown.

considered too strict and demanding (given the state

of the art in layout detection algorithms at that time).

In VOC 2008, the layout was still assessed by comput-

ing a precision-recall curve, but rather than assessing

parts individually the entire layout was assessed. To

be considered a true positive, each layout estimate had

to satisfy two criteria: (i) the set and number of pre-

dicted parts matches ground truth exactly e.g. {head,

hand, hand} or {head, hand, foot}; and (ii) the pre-

dicted bounding box of each part overlaps ground truth

by at least 50%. These criteria were relaxed from VOC

2010 on, though this task never became as popular as

the others.

3 VOC 2012 Results and Rankings

In this section we review the results of the VOC 2012

challenge to give a snapshot of the state-of-the-art in

the ﬁnal year of the challenge. Secs. 3.1, 3.2, 3.3 and 3.4

describe the top performing methods for the classiﬁca-

tion, detection, segmentation and action classiﬁcation

challenges in 2012 respectively (there were no entries

for complete person layout so we do not include that

here). Having done that, in Sec. 3.5, we then propose

a method to assess whether diﬀerences in AP between

the methods are signiﬁcant or not based on bootstrap

sampling – this is important as it enables one to tell if

the method proposed by the ‘runner up’ should actually

be considered as equivalent to that of the ‘winner’.

The VOC 2012 participants (and our codenames for

them) are listed in Table 2. Where p ossible we have

identiﬁed publications describing these methods in the

right hand column of the table; in addition short de-

scriptions were provided by the participants and are

available at the Pascal VOC 2012 challenge results

webpage (2012).

The number of images and objects in the VOC 2012

training and validation sets are shown as a histogram

for the classiﬁcation and detection challenges in Fig-

ure 2. The numbers are tabulated in Table 3 for classi-

100

200

300

400

700

1000

2000

3000

5000

10000

15000

aeroplane

bicycle

bird

boat

bottle

bus

car

cat

chair

cow

diningtable

dog

horse

motorbike

person

pottedplant

sheep

sofa

train

tvmonitor

images

objects

Fig. 2: Summary of the main VOC2012 dataset. Train-

ing and validation images only. Histogram by class of the

number of objects and images containing at least one object

of the corresponding class. Note the log scale on the vertical

axis. Best viewed in colour.

ﬁcation, detection and segmentation, and in Table 4 for

the action classiﬁcation challenge. There were 850 an-

notated objects instances in 609 images for the person

layout challenge.

3.1 Classiﬁcation

Figure 5 and Table 5 give summaries of the results

of the classiﬁcation challenge for both competition 1

(using supplied data only) and competition 2 (which

also allowed external data to be used). Figure 3 shows

precision-recall curves for a sample of the classes. The

winning method for competition 1 is NUS_SCM. Its

performance exceeded all other methods (including

those in competition 2) for all classes in 2012, and also

improved on the 2011 winning entries in all but one

class (‘pottedplant’). The NUS_SCM method started

from a fairly standard pipeline of a bag-of-visual-words

(BOW) representation and spatial pyramid matching

(SPM), followed by a support vector machine (SVM)

classiﬁer (see Sec. 6.1 for more details). To this they

added the identiﬁcation and use of sub-categories (e.g.

identifying diﬀerent types of chair), and a reﬁnement of

SPM based on the output of sliding window detection

conﬁdence maps.

3.2 Detection

Figure 6 and Table 6 give the results of the detec-

tion challenge for competition 3 (using supplied data

only); there were no entries for competition 4. Figure 4

shows precision-recall curves for a sample of the classes.

The winning method was UVA_HYBRID (see Van de

剩余39页未读，继续阅读

Ziko_AI

粉丝: 104
资源: 7

回顾Pascal VOC对象挑战：2008-2012年目标检测竞赛总结

PASCAL Visual Object Classes Challenge 2012 图像数据(VOC2012)数据集

object pascal编译器源码.rar_Object Pascal_compiler to pascal_pascal_p

object-pascal.rar_DELPHI pascal

Object-pascal.rar_delphi中文_pascal教程

object pascal.pdf

Object_Pascal语言.pdf

30018_object_pascal_handbook_by_marco_cantu.ZIP

Object-Pascal.rar_delphi 语法

30586_object_pascal_handbook_by_marco_cantu_updated_for_10.1_berlin.ZIP

Vcl.SuperObject_superobject_class_

最新资源