没有合适的资源?快使用搜索试试~ 我知道了~
首页无关键点精细人脸头部姿态估计:新方法与卓越性能
无关键点精细人脸头部姿态估计:新方法与卓越性能
需积分: 50 4 下载量 183 浏览量
更新于2024-09-08
收藏 1.51MB PDF 举报
本文主要探讨了"Fine-Grained Head Pose Estimation Without Keypoints"这一主题,即在不依赖关键点检测的情况下,实现更精细、鲁棒的人脸头部姿态估计。传统的头姿估计算法通常依赖于从目标人脸提取关键点(如眼睛、鼻子、嘴巴等),然后通过将这些2D关键点映射到三维人体模型来推算出头部的yaw(俯仰)、pitch(倾斜)和roll(旋转)角度。这种方法的问题在于其对关键点检测的准确性非常敏感,而且依赖于复杂的模型和后处理步骤,可能会导致在实际应用中的不稳定性和错误。 作者提出了一种创新的方法,即利用多损失卷积神经网络(CNN)对300W-LP这个大型合成数据集进行训练。300W-LP是通过对原始300-W人脸关键点基准数据进行扩展得到的,它包含了丰富的姿态变化样本。该网络直接从图像灰度信息中预测出Euler角,通过结合分类和回归的方式,实现了更为精确和稳定的姿态估计。这种方法避免了传统方法中的瓶颈,并且在公开的野外姿态评估基准数据上展现了最先进的性能。 此外,文章还展示了他们在使用深度数据的姿势估计算法测试集上的实验结果,显示出他们的方法在逐渐接近甚至超越深度感知的姿势估计方法。这一成果不仅提高了头姿估计的精度,而且简化了整个流程,减少了对外部模型和关键点检测的依赖。 研究者们强调了他们方法的优雅性和鲁棒性,以及开源训练和测试代码以及预训练模型的重要性,这使得其他研究人员能够更容易地复制和改进他们的工作。这篇论文为无关键点头姿估计提供了一个新的强有力的技术解决方案,对于面部分析、注意力模型、3D人脸建模和视频中的面部对齐等领域具有重大意义。
资源详情
资源推荐
Fine-Grained Head Pose Estimation Without Keypoints
Nataniel Ruiz Eunji Chong James M. Rehg
Georgia Institute of Technology
{nataniel.ruiz, eunjichong, rehg}@gatech.edu
Abstract
Estimating the head pose of a person is a crucial prob-
lem that has a large amount of applications such as aiding
in gaze estimation, modeling attention, fitting 3D models
to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the tar-
get face and solving the 2D to 3D correspondence problem
with a mean human head model. We argue that this is a
fragile method because it relies entirely on landmark detec-
tion performance, the extraneous head model and an ad-hoc
fitting step. We present an elegant and robust way to deter-
mine pose by training a multi-loss convolutional neural net-
work on 300W-LP, a large synthetically expanded dataset,
to predict intrinsic Euler angles (yaw, pitch and roll) di-
rectly from image intensities through joint binned pose clas-
sification and regression. We present empirical tests on
common in-the-wild pose benchmark datasets which show
state-of-the-art results. Additionally we test our method on
a dataset usually used for pose estimation using depth and
start to close the gap with state-of-the-art depth pose meth-
ods. We open-source our training and testing code as well
as release our pre-trained models
1
.
1. INTRODUCTION
The related problems of head pose estimation and fa-
cial expression tracking have played an important role over
the past 25 years in driving vision technologies for non-
rigid registration and 3D reconstruction and enabling new
ways to manipulate multimedia content and interact with
users. Historically, there have been several major ap-
proaches to face modeling, with two primary ones being
discriminative/landmark-based approaches [
26, 29] and pa-
rameterized appearance models, or PAMs [
4, 15] (see [30]
for additional discussion). In recent years, methods which
directly extract 2D facial keypoints using modern deep
learning tools [
2, 35, 14] have become the dominant ap-
proach to facial expression analysis, due to their flexibility
1
https://github.com/natanielruiz/deep-head-pose
and robustness to occlusions and extreme pose changes. A
by-product of keypoint-based facial expression analysis is
the ability to recover the 3D pose of the head, by establish-
ing correspondence between the keypoints and a 3D head
model and performing alignment. However, in some ap-
plications the head pose may be all that needs to be esti-
mated. In that case, is the keypoint-based approach still the
best way forward? This question has not been thoroughly-
addressed using modern deep learning tools, a gap in the
literature that this paper attempts to fill.
We demonstrate that a direct, holistic approach to esti-
mating 3D head pose from image intensities using convo-
lutional neural networks delivers superior accuracy in com-
parison to keypoint-based methods. While keypoint detec-
tors have recently improved dramatically due to deep learn-
ing, head pose recovery inherently is a two step process with
numerous opportunities for error. First, if sufficient key-
points fail to be detected, then pose recovery is impossible.
Second, the accuracy of the pose estimate depends upon the
quality of the 3D head model. Generic head models can
introduce errors for any given participant, and the process
of deforming the head model to adapt to each participant
requires significant amounts of data and can be computa-
tionally expensive.
While it is common for deep learning based methods us-
ing keypoints to jointly predict head pose along with fa-
cial landmarks, the goal in this case is to improve the accu-
racy of the facial landmark predictions, and the head pose
branch is not sufficiently accurate on its own: for exam-
ple [
14, 20, 21] which are studied in Section 4.1 and 4.3.
A conv-net architecture which directly predicts head pose
has the potential to be much simpler, more accurate, and
faster. While other works have addressed the direct regres-
sion of pose from images using conv-nets [
31, 19, 3] they
did not include a comprehensive set of benchmarks or lever-
age modern deep architectures.
In applications where accurate head pose estimation is
required, a common solution is to utilize RGBD (depth)
cameras. These can be very accurate, but suffer from a
number of limitations: First, because they use active sens-
ing, they can be difficult to use outdoors and in uncontrolled
1
2187
下载后可阅读完整内容,剩余9页未读,立即下载
匠人_C
- 粉丝: 26
- 资源: 10
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 李兴华Java基础教程:从入门到精通
- U盘与硬盘启动安装教程:从菜鸟到专家
- C++面试宝典:动态内存管理与继承解析
- C++ STL源码深度解析:专家级剖析与关键技术
- C/C++调用DOS命令实战指南
- 神经网络补偿的多传感器航迹融合技术
- GIS中的大地坐标系与椭球体解析
- 海思Hi3515 H.264编解码处理器用户手册
- Oracle基础练习题与解答
- 谷歌地球3D建筑筛选新流程详解
- CFO与CIO携手:数据管理与企业增值的战略
- Eclipse IDE基础教程:从入门到精通
- Shell脚本专家宝典:全面学习与资源指南
- Tomcat安装指南:附带JDK配置步骤
- NA3003A电子水准仪数据格式解析与转换研究
- 自动化专业英语词汇精华:必备术语集锦
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功