RetinaFace: 单阶段密集面部定位提升野外观测精度

需积分: 9 98 浏览量更新于2024-08-26 收藏 5.77MB PDF 举报

本文档主要探讨了一种名为RetinaFace的先进单阶段密集人脸定位算法，该算法在未受控制的人脸检测领域取得了显著进步，针对野外复杂场景下的精确且高效人脸定位提出了创新解决方案。RetinaFace的关键特点在于其结合了额外监督和自我监督的多任务学习策略，从而在不同尺度的人脸识别上实现像素级的定位能力。首先，研究者们在WIDER FACE数据集上进行了手动注解五个关键面部特征点，这种额外的监督信号显著提高了面对具有挑战性条件（如遮挡、姿态变化等）的硬人脸检测性能。额外标注的地标信息有助于模型更好地理解人脸结构，从而提升定位精度。其次，算法引入了一个自我监督的网格解码器分支，与现有的监督分支并行工作，用于预测每个像素的三维人脸形状信息。这种自监督学习方法增强了模型对人脸几何形状的理解，进一步提升了定位的准确性，并可能有助于减少对大量标注数据的依赖。在性能评估方面，RetinaFace在WIDER FACE的硬测试集上表现出色，相比于当前最先进的平均精度（AP），RetinaFace实现了1.1%的提升，达到91.4%的AP值，这表明其在面对复杂场景时具有优异的鲁棒性和准确性。此外，RetinaFace还通过与ArcFace等最先进的面部验证方法结合，帮助提高了在IJB-C测试集上的验证结果，如在FAR为1e-6的条件下，Face Verification TAR达到了89.59%。最后，RetinaFace特别强调了其在实际应用中的高效性，采用轻量级的背景区分网络设计，能够在单个CPU核心上实现实时处理VGA分辨率的图像，这对于在资源有限的设备上部署人脸识别技术具有重要意义。 RetinaFace通过结合额外和自我监督学习，以及精细的面部特征标注，成功地提高了人脸定位的准确性和鲁棒性，尤其在面临现实世界中各种复杂场景时，展现出强大的性能和实用性。

RetinaFace: Single-stage Dense Face Localisation in the Wild

Jiankang Deng

* 1,2,4

Jia Guo

* 2

Yuxiang Zhou

Jinke Yu

Irene Kotsia

Stefanos Zafeiriou

1,4

Imperial College London

InsightFace

Middlesex University London

FaceSoft

Abstract

Though tremendous strides have been made in uncon-

trolled face detection, accurate and efﬁcient face locali-

sation in the wild remains an open challenge. This pa-

per presents a robust single-stage face detector, named

RetinaFace, which performs pixel-wise face localisation

on various scales of faces by taking advantages of joint

extra-supervised and self-supervised multi-task learning.

Speciﬁcally, We make contributions in the following ﬁve

aspects: (1) We manually annotate ﬁve facial landmarks

on the WIDER FACE dataset and observe signiﬁcant im-

provement in hard face detection with the assistance of

this extra supervision signal. (2) We further add a self-

supervised mesh decoder branch for predicting a pixel-wise

3D shape face information in parallel with the existing su-

pervised branches. (3) On the WIDER FACE hard test set,

RetinaFace outperforms the state of the art average pre-

cision (AP) by 1.1% (achieving AP equal to 91.4%). (4)

On the IJB-C test set, RetinaFace enables state of the art

methods (ArcFace) to improve their results in face ver-

iﬁcation (TAR=89.59% for FAR=1e-6). (5) By employ-

ing light-weight backbone networks, RetinaFace can run

real-time on a single CPU core for a VGA-resolution im-

age. Extra annotations and code have been made avail-

able at: https://github.com/deepinsight/

insightface/tree/master/RetinaFace.

1. Introduction

Automatic face localisation is the prerequisite step of fa-

cial image analysis for many applications such as facial at-

tribute (e.g. expression [64] and age [38]) and facial identity

recognition [45, 31, 55, 11]. A narrow deﬁnition of face lo-

calisation may refer to traditional face detection [53, 62],

which aims at estimating the face bounding boxes without

any scale and position prior. Nevertheless, in this paper

Equal contributions.

Email: j.deng16@imperial.ac.uk; guojia@gmail.com

InsightFace is a nonproﬁt Github project for 2D and 3D face analysis.

Figure 1. The proposed single-stage pixel-wise face localisation

method employs extra-supervised and self-supervised multi-task

learning in parallel with the existing box classiﬁcation and regres-

sion branches. Each positive anchor outputs (1) a face score, (2) a

face box, (3) ﬁve facial landmarks, and (4) dense 3D face vertices

projected on the image plane.

we refer to a broader deﬁnition of face localisation which

includes face detection [39], face alignment [13], pixel-

wise face parsing [48] and 3D dense correspondence regres-

sion [2, 12]. That kind of dense face localisation provides

accurate facial position information for all different scales.

Inspired by generic object detection methods [16, 43, 30,

41, 42, 28, 29], which embraced all the recent advances in

deep learning, face detection has recently achieved remark-

able progress [23, 36, 68, 8, 49]. Different from generic

object detection, face detection features smaller ratio varia-

tions (from 1:1 to 1:1.5) but much larger scale variations

(from several pixels to thousand pixels). The most re-

cent state-of-the-art methods [36, 68, 49] focus on single-

stage [30, 29] design which densely samples face locations

and scales on feature pyramids [28], demonstrating promis-

ing performance and yielding faster speed compared to two-

stage methods [43, 63, 8]. Following this route, we improve

the single-stage face detection framework and propose a

state-of-the-art dense face localisation method by exploit-

ing multi-task losses coming from strongly supervised and

self-supervised signals. Our idea is exampliﬁed in Fig. 1.

Typically, face detection training process contains both

classiﬁcation and box regression losses [16]. Chen et al. [6]

proposed to combine face detection and alignment in a joint

cascade framework based on the observation that aligned

arXiv:1905.00641v2 [cs.CV] 4 May 2019

下载后可阅读完整内容，剩余9页未读，立即下载

protocols

粉丝: 1
资源: 9

RetinaFace: 单阶段密集面部定位提升野外观测精度

探索4G演进：LTE-Advanced技术概览

深度学习驱动的道路场景深度估计：ResNet-DenseNet结合方法

HCIE论述题详解：PIM-SM和IGMP协议在路由器中的操作

chineseocr model part3-2 : ocr-dense-lstm.zip

论文研究-PA-RetinaNet: Path Augmented RetinaNet for Dense Object Detection.pdf

谭平-Towards Dense Monocular SLAM.pdf

【Theoretical Deepening】: Cracking the Convergence Dilemma of GANs: In-Depth Analysis from Theory ...

Time Series Autoregressive Models: In-depth Exploration and Practical Techniques

Step-by-Step Examination of 7 Common Causes, Thoroughly Resolving the Crash Troubles

【LSTM Model Time Series Forecasting】: In-depth Understanding and Practical Guide

最新资源