实时人脸对齐的粗细层次自编码网络

需积分: 15 39 浏览量更新于2024-07-20 收藏 702KB PDF 举报

Coarse-to-Fine Auto-Encoder Networks (CFAN) 是一篇由 Jie Zhang、Shiguang Shan、Meina Kan 和 Xilin Chen 合著的论文，发表在山世光教授的 SeetaFace 研究背景下。该研究专注于实时人脸对齐问题，这是许多面部感知任务（如人脸识别、面部表情分析和非现实感面部渲染）的关键预处理步骤。这些任务通常涉及从检测到的面部区域推断出关键点，即面部 landmarks。传统的深度学习方法在处理这种非线性关系上具有潜力，但直接应用深度网络并非易事。作者提出了一种创新的解决方案，即 Coarse-to-Fine Auto-Encoder Networks（粗到细自编码器网络）。CFAN 采用了一种逐层递进的方式，通过串联几个堆叠的自编码器网络（Stacked Auto-encoders, SANs）来实现。这种方法的核心思想是分阶段处理，首先，第一个 SAN 负责快速且准确地预测初步的面部关键点，它接受低分辨率的输入以减少计算复杂度。具体来说，这个过程分为两个阶段：粗略估计阶段和精细调整阶段。在粗略估计阶段，较低精度的特征图作为输入，自编码器首先生成一个粗糙但位置接近的地标候选。然后，在精细调整阶段，后续的 SANs 接收这个初始预测结果，逐步提高精度，通过多层神经网络的迭代优化，逐渐细化关键点的位置，直到达到所需的准确性。这种分层次的方法有几个优势：首先，它降低了深度网络的训练难度，使得模型能够更有效地学习和适应人脸对齐的复杂性；其次，通过逐级细化，避免了过拟合问题，提高了模型的泛化能力；最后，由于采用了自编码器结构，CFAN 还具备一定的数据压缩和重构能力，有助于减少内存消耗和提升实时性能。 Coarse-to-Fine Auto-Encoder Networks 是一种创新的深度学习架构，专为实时、高效和精确的人脸关键点定位而设计，对于提高各种基于面部的计算机视觉应用的性能具有重要意义。通过论文中的实验和评估，研究人员展示了 CFAN 在实际场景中展现出的有效性和效率，这为后续研究者在实时人脸对齐领域的进一步探索提供了新的思路和技术支持。

4 J. Zhang et al.

strategy, the search space of each SAN, or in other words the diﬃculty of the

task for each SAN, is well controlled and thus more tractable. Beneﬁtted from

the advantages of joint local features, our method is more robust to partial

occlusions than DCNN [26] as shown in the last row of Fig. 2.

Extensive evaluation results on several public databases, i.e.,XM2VTS[22],

LFPW [3] and HELEN [18], show that our method achieves impressively bet-

ter accuracies, compared with the state-of-the-art methods, such as SDM and

DRMF. Furthermore, our method (in Matlab codes) takes about 23 milliseconds

per image to predict 68 facial points excluding the face detection time, on an

desktop machine with Intel i7-3770 (3.4 GHz CPU).

2 Related Works

2.1 Local Models with Regression Fitting

Recently local model methods with Regression Fitting [28,11,33,31,2] make great

progresses on facial point detection, especially SDM [31]. Local methods like

ASMs [8,15,23] and CLMs [9,25] solve the optimization problems with Gauss-

Newton method. Yet, instead of computing the Jacobian and Hessian matrices,

SDM learns generic descent directions and re-scaling factors by using the linear

regression. Speciﬁcally, given an image x ∈ R

, S denotes the shape vector

containing the coordinates of the facial points. The objective of most regression

ﬁtting model can be formulated as optimizing a sequence of successive update

ΔS for shape as follows:

f(S

+ ΔS)=||Φ(S

+ ΔS) − Φ(S

)||

, (1)

where S

and S

denote the initial shape and ground truth shape respectively

and Φ is a nonlinear feature extraction function from a shape. The shape update

ΔS can be obtained by employing Newton’s method as follows:

ΔS = −H

−1

= −2H

−1

(Φ(S

) − Φ(S

)), (2)

where J

and H are the Jacobian and Hessian matrices.

SDM directly estimates the descent direction R

= −2H

−1

by using a linear

regression between the appearance information and the shape deviation to avoid

the complex computations of Jacobian and inverse of Hessian matrices. Thus, in

SDM, Eq. (2) is formulated as bellow:

ΔS

= R

+ b

, (3)

where b

is a bias term corresponding to Φ(S

). In a similar way, SDM can learn

a sequence of generic descent directions R

and bias term b

after k iterations.

ΔS

= R

k−1

+ b

. (4)

For most methods including SDM, the mean shape is used as the initialization,

which may suﬀer from local minimum problem in case of bad initializations. To

depress the eﬀects from bad initializations, Cao et al. [6] use multiple initializa-

tions strategy and Burgos-Artizzu et al. [5] adopt smart restarts technique, but

it still leaves a long way to go.

剩余15页未读，继续阅读

Elijah_Yi

粉丝: 38
资源: 29

实时人脸对齐的粗细层次自编码网络

A coarse-to-fine framework to efficiently thwart plagiarism

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures

Cloud Detection in Remote Sensing Images：论文“A Coarse-to-Fine Method for Cloud Detection in Remote Sensing Images, IEEE Geoscience and Remote Sensing Letters”的代码-matlab开发

详细解读：Image-to-Markup Generation with Coarse-to-Fine Attention中的Row Encoder

uid 1027 does not have android.permission.ACCESS_COARSE_LOCATION or android.permission.ACCESS_FINE_LOCATION.

CVP-MVSNET

if interpolate_response == 2 % use dynamic interp size interp_sz = floor(size(y) * featureRatio * currentScaleFactor); end responsef = resizeDFT2(responsef, interp_sz);这种是coarse grid 的应用吗

java.lang.SecurityException: uid 10289 does not have android.permission.ACCESS_COARSE_LOCATION or android.permission.ACCESS_FINE_LOCATION.

multi-scale

最新资源