自适应级联回归模型：鲁棒人脸对准新方法

183 浏览量更新于2024-08-27 收藏 10.79MB PDF 举报

"这篇研究论文探讨了一种名为‘鲁棒的人脸对准的自适应级联回归模型’的先进技术，旨在解决传统级联回归方法在处理真实世界中带有遮挡的人脸图像时面临的挑战。级联回归是人脸识别和对准领域常用的方法，尽管在大量数据集上表现出色，但当面临遮挡等问题时，其性能会受到影响。论文作者Qingshan Liu、Jiankang Deng、Jing Yang、Guangcan Liu和Dacheng Tao提出的新模型通过引入形状索引的外观特征来估计每个关键点的遮挡程度，并据此对每个关键点进行加权。这种权重分配机制可以降低因遮挡导致的噪声影响。此外，论文还设计了一种基于实例的形状先验，用于抑制局部图像破坏对对准结果的影响。通过大量的实验验证，该模型展示了在复杂条件下的鲁棒性。" 这篇研究论文的核心知识点包括： 1. 级联回归（Cascaded Regression）：级联回归是一种逐层优化的方法，常用于人脸识别中的关键点检测，如人脸地标定位。它通过一系列简单的分类器逐步精确定位目标位置，效率高且效果好。 2. 鲁棒性（Robustness）：鲁棒性是指模型在面对数据中的噪声、异常值或遮挡等干扰时仍能保持稳定性能的能力。针对这个问题，论文提出了自适应级联回归模型。 3. 形状索引的外观特征（Shape-Indexed Appearance Features）：这是一种特征表示方法，结合了形状信息和像素级外观信息，用于估计每个关键点的遮挡程度。这有助于识别出哪些区域可能被遮挡，从而在后续处理中降低它们的影响。 4. 遮挡估计（Occlusion Estimation）：论文引入了遮挡估计机制，通过对每个关键点分配权重，根据其被遮挡的程度调整其在回归过程中的影响力。这提高了模型在处理遮挡人脸图像时的准确性。 5. 自适应权重（Adaptive Weights）：每个关键点的权重根据其遮挡估计值动态调整，以减少噪声对模型的影响。这种自适应性增强了模型在复杂环境下的鲁棒性。 6. 基于实例的形状先验（Exemplar-Based Shape Prior）：这是一种利用历史数据或已知良好结果来约束当前预测的方法，帮助模型在遇到局部图像破坏时抑制错误的影响，保持对整体形状的准确估计。 7. 实验验证（Extensive Experiments）：论文通过大量的实验评估了新模型的效果，证明了它在实际应用中的强大性能，特别是在处理现实世界中可能存在遮挡的人脸图像时。这些技术细节展示了该研究如何通过创新方法提高级联回归模型在复杂条件下的脸部对准能力，对于人脸识别领域的研究具有重要的参考价值。

JOURNAL OF L

X CLASS FILES, VOL. 4, NO. 5, APRIL 2015 3

indexed features Φ(I

, S

t−1

) and adaptively constrain the inter-

mediate shapes by exemplar shapes during cascade regression.

arg min

i=1





∗

− Ψ



t−1

, γ

t−1



−R



, Ψ



t−1

, γ

t−1





where W

= diag(w

, w

, . . . , w

, 1), w

= w

P ×P

indicates the occlusion level for the landmark j, and P is

the dimension of the landmark feature vector. The goal of W

is to allocate high weights to the uncorrupted landmarks and

small or even zero weights to the occluded landmarks, which

can efﬁciently restrain the inﬂuence of the occluded regions.



t−1

, γ

t−1



is the output of the exemplar-based shape

constraint on S

t−1

. D

∈ R

2N×M

is the exemplar shape

dictionary, α

t−1

∈ R

M×1

is the reconstruction coefﬁcient,

Ψ is the similarity transformation, and γ

t−1

is the similarity

transformation coefﬁcient.

A. Adaptive exemplar-based shape model

The sparse shape model is widely used as a shape constraint

because it is able to correct gross errors of the input shape

and preserve shape details, even if they are not statistically

signiﬁcant in the training set [23], [24].

min

kS − D

αk

+ λkαk

, (3)

where S = (x

, . . . x

, y

, . . . y

)

is the observed normal-

ized shape, D

∈ R

2N×M

is the normalized shape dictionary,

α ∈ R

M×1

is the sparse reconstruction coefﬁcient, and λ is

the regularization parameter. The sparse shape model has two

drawbacks, however: 1) It is very time-consuming due to the

high dimension l

optimization problem, i.e., so-called Lasso

optimization [25]; 2) All the landmark points in the shape

are treated equally, including the corrupted landmarks [23],

[24]. If some landmarks are occluded, the errors from the

misaligned landmarks will spread to all the other landmarks

to some extend, due to the residual minimization used in the

procrustes analysis [43] and shape reconstruction [44].

As can be seen from Fig. 1, there is an occlusion above the

left eye which causes the alignment result on the corresponding

area to be inaccurate. We take the normalized face shapes

from the HELEN [45] training set as the shape dictionary

and give two sparse reconstruction results under difference

regularization parameters. When we set λ as 0.001, the number

of non-zero coefﬁcients is 20, and the gross error on the

left eye is removed. However, the gross error on the left

eyebrow is still there. As we increase λ to 0.05, the constrained

shape is smoother, but the accuracy of the non-occluded area

is sacriﬁced. If we decrease λ, the constrained shape will

be almost the same as the input face shape without the

ability to correct the gross error, due to the minimization of

the reconstruction error. Thus, sometimes the sparse shape

constraint has limited ability in ﬁxing the gross error.

To overcome the above issue, we propose a new exemplar-

based shape model as,

min



S −



S  W





, (4)

where W

= diag(w

, . . . , w

, w

, . . . , w

) is the weighting

matrix, and w

is occlusion level. The purpose of W

is to

evaluate the input shape with non-occluded landmarks as accu-

rately as possible, and  selects the most important k-nearest

exemplar shapes from the shape dictionary.



S  W



is used to select the nearest exemplar shapes of W

S from the

dynamic exemplar shape dictionary W

. In other words,

and  impose different weights on the row and column

respectively of the exemplar shape dictionary, which makes

the shape constraint more ﬂexible, robust and efﬁcient.

Compared to time-consuming Lasso optimization, the pro-

posed adaptive exemplar-based shape model is more efﬁcient.

Given the shape dictionary D

∈ R

2N×M

, the computational

complexity of the interior-point convex optimization solver

for l

optimization problem is O





. The computational

complexity of our method lies only in the selection of the

K-Nearest Neighbors, and the reconstruction coefﬁcients are

computed directly by the least squares method. In Lasso

optimization, the sparse coefﬁcients tend to be local and

items with larger coefﬁcients are more similar to the input

sample [46], [47]. In the proposed adaptive shape model, we

select the k-Nearest exemplar shapes of the input shape as the

shape bases for reconstructing the input shape directly and set

the coefﬁcients corresponding to the remaining exemplars in

the dictionary as zero. This means that the input face shape can

be reconstructed by the nearest exemplar shapes [22], which is

feasible both in empirical observation and theory analysis [22],

[48].

As shown in Fig. 2, we can only take account of the

non-occluded landmarks when the occlusion levels of the

landmarks are given, and we use the nearest exemplar-based

shape model to reconstruct the shape by Eq 4. Surprisingly,

even though we still reconstruct the face shape using only 20

exemplar face shapes, as in Fig. 1, the result is very satisfying.

The gross error on the occluded area is almost removed, and

the landmarks on the non-occluded area are as accurate.

B. Occlusion inference model

It is well known that it is difﬁcult to detect occluded land-

marks if only the local features around the landmarks are used.

However, occlusion is generally a continuous local region with

an irregular size and becomes obvious in the shape-normalized

appearance [15]. Inspired by [34], the facial shape and ap-

pearance tend to be consistent on the exemplar dictionary.

We construct the shape-normalized appearance dictionary D

which is directly derived from the exemplars D

. The shape-

indexed appearance is then constructed by D

β, β ∈ R

M×1

on the exemplar appearance dictionary, and the reconstruction

discrepancy kA − D

βk

is taken to estimate the occlusion

levels w

, j = 1, · · · , N of the landmarks and detect the

occluded landmarks.

To effectively calculate the appearance reconstruction coefﬁ-

cients β, we utilize Canonical Correlation Analysis (CCA) [49]

剩余11页未读，继续阅读

weixin_38725119

粉丝: 4

自适应级联回归模型：鲁棒人脸对准新方法

基于自适应LBP人脸识别的身份验证.pdf

双层级联神经网络的人脸超分辨率重建.pdf

viola-jones-adaboost：在Viola和Jones之后使用自适应增强训练面部检测级联

人脸识别系统

renlianshibie.zip_人脸特征定位_人脸特征检测_图像定位_眼镜

MATLAB实现非级联Adaboost人脸检测简易教程

高效人脸检测：Viola算法的集成级联与积分图像技术

MTCNN深度级联神经网络在实时人脸检测中的应用

尺度自适应人脸识别：结合方向与纹理特征的创新方法

Haar级联分类器文件应用解析

最新资源