单幅图像深度3D人脸重建学习方法

需积分: 10 6 浏览量更新于2024-07-06 收藏 9.6MB PDF 举报

本文主要探讨了"Deep 3D Portrait from a Single Image"这一主题，它是一种基于深度学习的方法，旨在从单张肖像照片中恢复人类头部的三维几何结构。作者Sicheng Xu、Jiaolong Yang、Dong Chen、Fang Wen、YuDeng和Yunde Jia，以及来自北京理工大学、微软亚洲研究院和清华大学的研究者们，共同开发了这个无监督学习系统，无需依赖任何真实的三维数据作为训练样本。该方法的核心是使用参数化的三维人脸模型（如3DMM）结合一个深度图来表示头部，包括头发和耳朵等区域。这种方法采用两步学习策略：首先，通过自我重建技术在单张图像上学习面部形状；然后，利用立体匹配的方式处理头发和耳朵部分，进一步提高几何精度并确保整体头部结构的一致性。这种分步骤学习不仅提高了重建的准确性，还确保了头像在不同姿势下的三维一致性。为了评估方法的性能，研究人员不仅考察了三维几何结构的精确度，还通过在二维图像上的姿态变换任务来检验其实际应用效果。他们设计了一个后处理网络，通过对抗学习进行优化，以便在恢复的几何基础上进行精细调整，以生成更逼真的三维复原结果。这项研究展示了深度学习在单张肖像照片中捕捉和重构复杂头部细节的能力，为人脸相关的3D建模、动画制作、虚拟现实等领域提供了新的可能性。同时，它的无监督学习框架也为未来的3D重建工作提供了有价值的技术路线。

constraint by minimizing the brightness error

color

) − I

(4)

where H

= H

, d

) is the warped region from H

computed by head poses and d

in the transformation pro-

cess described above; similarly for H

= H

, d

). We

also apply a gradient discrepancy loss which is robust to il-

lumination change thus widely adopted in stereo and optical

ﬂow estimation [6, 5, 56]:

g rad

k∇I

) −∇I

k∇I

) −∇I

(5)

where ∇ denotes the gradient operator. To impose a spatial

smoothness prior, we add a second-order smoothness loss

smooth

|∆d

| +

|∆d

| (6)

where ∆ denotes the Laplace operator.

Face depth as condition and output. Instead of directly

estimating hair and ear depth from the input image I, we

project the reconstructed face shape F onto image plane to

get a face depth map d

. We make d

an extra conditional

input concatenated with I. Note d

provides beneﬁcial in-

formation (e.g., head pose, camera distance) for hair and

ear depth estimation. In addition, it allows the known face

depth around the contour to be easily propagated to the ad-

jacent regions with unknown depth.

More importantly, we train the network to also predict

the depth of the facial region using d

as target:

face

−S

− d

−S

− d

| (7)

where S

denotes the hair region deﬁned by segmentation.

Note learning face depth via l

face

should not introduce

much extra burden for the network since d

is provided as

input. But crucially, we can now easily enforce the consis-

tency between the reconstructed 3D face and the estimated

3D geometry in other regions, as in this case we calculate

the smoothness loss across whole head regions S

, S

smooth

|∆d

| +

|∆d

| (8)

Figure 2 (2nd and 3rd columns) compares the results

with and without face depth. We also show quantitative

comparisons in Table 1 (2nd and 3rd columns). As can be

observed, using face depth signiﬁcantly improves head ge-

ometry consistency and reconstruction accuracy.

Layer-order loss. Hair can often occlude a part of facial

region, leading to two depth layers. To ensure correct rela-

tive position between the hair and occluded face region (i.e.,

Input w/o face depth with face depth + l

layer

Figure 2: 3D head reconstruction result of our method with

different settings.

the former should be in front of the latter), we introduced a

layer-order loss deﬁned as:

layer

max(0, d

− d

max(0, d

− d

)

(9)

which penalizes incorrect layer order. As shown in Fig. 2,

the reconstructed shapes are more accurate with l

layer

Network structure. We apply a simple encoder-decoder

structure using a ResNet-18 [25] as backbone. We discard

its global average pooling and the last fc layers, and append

several transposed convolutional layers to upsample the fea-

ture maps to the full resolution. Skip connections are added

at 64×64, 32×32 and 16×16 resolutions. The input image

size is 256 × 256. More details of the network structure can

be found in the suppl. material.

5. Single-Image Head Pose Manipulation

Given the 3D head model reconstructed from the input

portrait image, we modify its pose and synthesize new por-

trait images, described as follows.

5.1. 3D Pose Manipulation and Projection

To change the head pose, one simply needs to apply a

rigid transformation in 3D for the 3DMM face F and hair-

ear mesh H given the target pose

p or displacement δp. Af-

ter the pose is changed, we reproject the 3D model onto 2D

image plane to get coarse synthesis results. Two examples

are shown in Fig. 3.

5.2. Image Reﬁnement with Adversarial Learning

The reprojected images suffer from several issues. No-

tably, due to pose and expression change, some holes may

appear, where the missing background and/or head region

should be hallucinated akin to an image inpainting process.

Besides, the reprojection procedure may also induce certain

artifacts due to imperfect rendering.

剩余15页未读，继续阅读

TracelessLe

粉丝: 6w+
资源: 468

单幅图像深度3D人脸重建学习方法

Portrait Relief Modeling from a Single Image

Deep3dPortrait center.mat

Automatic Portrait Segmentation for Image Stylization 数据集

portrait

Demographic Information Prediction: A Portrait of Smartphone Application Users

portrait_zxing

portrait_matting

portrait-manager

edges2portrait

Deep3dPortrait:来自单个图像的深3D肖像

最新资源