小目标人脸检测：深度层次特征与细粒度识别

1星需积分: 24 166 浏览量更新于2024-09-07 收藏 9.92MB PDF 举报

小目标人脸检测：探索挑战与解决策略在计算机视觉领域，小目标检测一直是一个具有挑战性的课题。这篇发表于2018年CVPR（计算机视觉与模式识别）会议的论文《FindingTinyFaces》由Peiyun Hu和Deva Ramanan两位作者提出，他们针对小目标人脸检测提出了创新的方法，尤其是在处理尺寸变化、图像分辨率以及上下文理解这三个关键方面。首先，论文强调了尺度不变性在小目标检测中的重要性。传统上，许多对象识别方法追求的是对物体大小的鲁棒性，但人脸识别的挑战在于，识别3像素高度的小脸和300像素高度的大脸所依赖的特征是截然不同的。为了应对这一问题，研究者们没有采用传统的单一尺度检测器，而是开发了一种新型策略——训练不同尺度的专用检测器。这种方法旨在提高检测效率，通过让单个深度特征层次的不同层共享特征，实现多任务学习。其次，图像分辨率对小目标的清晰度至关重要。高分辨率的图像能够提供更多的细节，有助于更准确地定位和识别小人脸。论文可能探讨了如何在保持高清的同时，优化网络架构以适应不同尺度的目标，比如通过调整卷积核大小、步幅或池化策略。最后，上下文推理是另一个关键要素。在复杂的图像环境中，理解人脸与其他元素之间的关系有助于区分真正的人脸和背景干扰。论文可能探讨了如何利用周围环境的信息，如人脸与周围物体的位置、姿势和语义关联，来增强小目标人脸的检测性能。论文的核心贡献可能包括一个专门设计的算法或网络架构，它能够在大规模数据集上训练出高效且准确的小目标人脸检测模型，同时在面对复杂场景和多种尺度变化时展现出良好的稳健性和准确性。此外，为了验证其效果，作者可能还展示了实验结果，对比了他们的方法与传统方法在小目标人脸检测任务上的性能，并通过信心度评估来衡量错误检测的可能性。这篇论文不仅解决了小目标人脸检测的理论问题，而且提供了实用的技术解决方案，为该领域的研究者和工程师提供了有价值的研究方向和实践经验。通过深入研究和应用这些概念，计算机视觉技术在诸如安防监控、社交媒体分析等场景中的小目标识别能力将得到显著提升。

Figure 3: On the left, we visualize a large and small face,

both with and without context. One does not need context

to recognize the large face, while the small face is dramat-

ically unrecognizable without its context. We quantify this

observation with a simple human experiment on the right,

where users classify true and false positive faces of our pro-

posed detector. Adding proportional context (by enlarging

the window by 3X) provides a small improvement on large

faces but is insufﬁcient for small faces. Adding a ﬁxed con-

textual window of 300 pixels dramatically reduces error on

small faces by 20%. This suggests that context should be

modeled in a scale-variant manner. We operationalize this

observation with foveal templates of massively-large recep-

tive ﬁelds (around 300x300, the size of the yellow boxes).

2. Related work

Scale-invariance: The vast majority of recognition

pipelines focus on scale-invariant representations, dating

back to SIFT[15]. Current approaches to detection such as

Faster RCNN [18] subscribe to this philosophy as well, ex-

tracting scale-invariant features through ROI pooling or an

image pyramid [19]. We provide an in-depth exploration

of scale-variant templates, which have been previously pro-

posed for pedestrian detection[17], sometimes in the con-

text of improved speed [3]. SSD [13] is a recent technique

based on deep features that makes use of scale-variant tem-

plates. Our work differs in our exploration of context for

tiny object detection.

Context: Context is key to ﬁnding small instances as

shown in multiple recognition tasks. In object detection, [2]

stacks spatial RNNs (IRNN[11]) model context outside the

region of interest and shows improvements on small object

detection. In pedestrian detection, [17] uses ground plane

estimation as contextual features and improves detection on

small instances. In face detection, [28] simultaneously pool

ROI features around faces and bodies for scoring detections,

which signiﬁcantly improve overall performance. Our pro-

posed work makes use of large local context (as opposed to

a global contextual descriptor [2, 17]) in a scale-variant way

(as opposed to [28]). We show that context is mostly useful

for ﬁnding low-resolution faces.

Multi-scale representation: Multi-scale representation

has been proven useful for many recognition tasks. [8, 14,

1] show that deep multi-scale descriptors (known as “hy-

percolumns”) are useful for semantic segmentation. [2, 13]

demonstrate improvements for such models on object detec-

tion. [28] pools multi-scale ROI features. Our model uses

“hypercolumn” features, pointing out that ﬁne-scale fea-

tures are most useful for localizing small objects (Sec. 3.1

and Fig. 5).

RPN: Our model superﬁcially resembles a region-

proposal network (RPN) trained for a speciﬁc object class

instead of a general “objectness” proposal generator [18].

The important differences are that we use foveal descrip-

tors (implemented through multi-scale features), we select

a range of object sizes and aspects through cross-validation,

and our models make use of an image pyramid to ﬁnd ex-

treme scales. In particular, our approach for ﬁnding small

objects make use of scale-speciﬁc detectors tuned for inter-

polated images. Without these modiﬁcations, performance

on small-faces dramatically drops by more than 10% (Ta-

ble 1).

3. Exploring context and resolution

In this section, we present an exploratory analysis of the

issues at play that will inform our ﬁnal model. To frame

the discussion, we ask the following simple question: what

is the best way to ﬁnd small faces of a ﬁxed-size (25x20)?.

By explicitly factoring out scale-variation in terms of the

desired output, we can explore the role of context and the

canonical template size. Intuitively, context will be crucial

for ﬁnding small faces. Canonical template size may seem

like a strange dimension to explore - given that we want to

ﬁnd faces of size 25x20, why deﬁne a template of any size

other than 25x20? Our analysis gives a surprising answer

of when and why this should be done. To better understand

the implications of our analysis, along the way we also ask

the analogous question for a large object size: what is the

best way to ﬁnd large faces of a ﬁxed-size (250x200)?.

Setup: We explore different strategies for building

scanning-window detectors for ﬁxed-size (e.g., 25x20)

faces. We treat ﬁxed-size object detection as a binary

heatmap prediction problem, where the predicted heatmap

at a pixel position (x, y) speciﬁes the conﬁdence of a ﬁxed-

size detection centered at (x, y). We train heatmap predic-

tors using a fully convolutional network (FCN) [14] deﬁned

over a state-of-the-art architecture ResNet [9]. We explore

multi-scale features extracted from the last layer of each

res-block, i.e. (res2cx, res3dx, res4fx, res5cx) in terms of

ResNet-50. We will henceforth refer to these as (res2, res3,

res4, res5) features. We discuss the remaining particulars of

our training pipeline in Section 5.

剩余11页未读，继续阅读

xiangpijiao

粉丝: 7

小目标人脸检测：深度层次特征与细粒度识别

小程序demo：人脸检测

Python-Insightface人脸检测识别的最小化推理包

小目标人脸检测的ｐｐｔ

WiderFace小目标人脸检测数据集VOC+YOLO格式7906张图片

目标检测&目标追踪&人脸检测&人脸识别

多目标人脸检测方法研究教材.docx

人脸检测、人眼检测、目标检测

RCNN_目标检测_人脸检测

基于YOLO-lite的web实时人脸检测，tfjs人脸检测，目标检测

基于YOLO-lite的web实时人脸检测，tfjs人脸检测，目标检测.zip

最新资源