深度学习框架：复杂环境人脸识别与属性预测

需积分: 50 31 浏览量更新于2024-09-09 收藏 6.01MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"复杂环境下的人脸识别：Deep Learning Face Attributes in the Wild" 本文主要探讨了在复杂环境条件下进行人脸识别的挑战以及提出了一种新颖的深度学习框架，该框架专门用于野外环境下的面部属性预测。作者Ziwei Liu、Ping Luo、Xiaogang Wang和Xiaoou Tang来自香港中文大学的信息工程和电子工程系。传统的面部识别方法在面对复杂光照、表情、遮挡等因素时往往表现不佳。为了解决这些问题，研究团队设计了一个由两个卷积神经网络（CNN）组成的级联结构，分别称为LNet和ANet。这两个网络协同微调以预测面部属性，但它们的预训练方式有所不同。 LNet，主要用于面部定位，通过大量通用物体类别数据进行预训练，这有助于它在各种环境中准确识别和定位人脸。另一方面，ANet专注于面部属性预测，其预训练过程使用了大量的人脸身份数据，这使得它能够理解和解析面部的细微差异，如性别、年龄、表情等。这个框架的创新之处在于，不仅在性能上显著超越了现有的最佳技术，而且揭示了关于学习面部表示的有价值见解。首先，它表明通过不同的预训练策略，可以提升面部定位（LNet）和属性预测（ANet）的性能。其次，研究发现尽管LNet的滤波器仅用图像级别的属性标签进行微调，但它仍然能够有效地学习到与面部属性相关的特征。这表明深度学习模型能够在没有明确的属性标注的情况下，从大规模数据中自我学习并提取有用信息。此外，该框架可能对实际应用，如监控系统、社交媒体身份验证、安全门禁等领域产生深远影响。通过提高在真实世界复杂条件下的识别准确性，这种技术可以增强人工智能系统的实用性和可靠性。同时，该研究也为我们理解如何设计和优化深度学习模型以处理视觉识别问题提供了新的视角和方法。 "Deep Learning Face Attributes in the Wild"是深度学习在面部识别领域的一次重要突破，通过创新的网络结构和预训练策略，提高了在复杂环境中的面部属性预测能力，对后续的研究和应用具有重要指导价值。

资源详情

资源推荐

Linear SVM

Smiling

Wavy Hair

No Beard

High Cheekbones

…

(a) LNet

(b) LNet

(d) Extracting features to predict attributes

(5)

(4)

Linear SVM

Figure 2. The proposed pipeline of attribute prediction (Best viewed in color)

accuracy of face localization. Both LNet

and LNet

have

network structures similar to AlexNet [13], whose hyper

parameters are speciﬁed in Fig.2 (a) and (b) respectively.

The ﬁfth convolutional layer (C5) of LNet

indicates head-

shoulders while C5 of LNet

indicates faces, with their

highly responsed regions in their averaged response maps.

Moreover, the input x

of LNet

is a m × n image, while

the input x

of LNet

is the head-shoulder region, which is

localized by LNet

and resized to 227 × 227.

As illustrated in Fig.2 (c), ANet is learned to predict

attributes y by providing the input face region x

, which is

detected by LNet

and properly resized. Speciﬁcally, multi-

view versions [13] of x

are utilized to train ANet. Further-

more, ANet contains four convolutional layers, where the

ﬁlters of C1 and C2 are globally shared and the ﬁlters of C3

and C4 are locally shared. The effectiveness of local ﬁlters

have been demonstrated in many face related tasks [26, 28].

To handle complex face variations, ANet is pre-trained by

distinguishing massive face identities, which facilitates the

learning of discriminative features.

Fig.2 (d) outlines the procedure of attribute recognition.

ANet extracts a set of feature vectors (FCs) by cropping

overlapping patches on x

. An efﬁcient feed-forward

algorithm is developed to reduce redundant computation

in the feature extraction stage. SVMs [8] are trained to

predict attribute values given each FC. The ﬁnal prediction

is obtained by averaging all these values, to cope with small

misalignment of face localization.

2.1. Face Localization

The cascade of LNet

and LNet

accurately localizes

face regions by being trained on image-level attribute tags.

Pre-training LNet Both LNet

and LNet

are pre-

trained with 1, 000 general object categories from the

ImageNet Large Scale Visual Recognition Challenge

(ILSVRC) 2012 [6], containing 1.2 million training images

and 50 thousands validation images. All the data is

employed for pre-training except one third of the validation

data for choosing hyper-parameters [13]. We augment

data by cropping ten patches from each image, including

one patch at the center and four at the corners, and their

horizontal ﬂips. We adopt softmax for object classiﬁcation,

which is optimized by stochastic gradient descent (SGD)

with back-propagation (BP) [16]. As shown in Fig.3

(a.2), the averaged response map in C5 of LNet

already

indicates locations of objects including human faces after

pre-training.

Fine-tuning LNet Both LNet

and LNet

are ﬁne-tuned

with attribute tags. Additional output layers are added to

the LNets individually for ﬁne-tuning and then removed for

evaluation. LNet

adopts the full image x

as input while

LNet

uses the highly responsed region x

in the averaged

response map in C5 of LNet

as input, which roughly re-

spond to head-shoulders. The cross-entropy loss is used for

attribute classiﬁcation, i.e. L =

i=1

log p(y

|x)+(1−

) log



1 − p(y

|x)



, where p(y

= 1|x) =

1+exp(−f(x))

is the probability of the i-th attribute given image x. As

shown in Fig.3 (a.3), the response maps after ﬁne-tuning

become much more clean and smooth, indicating that the

ﬁlters learned by attribute tags can detect face patterns with

complex variations. To appreciate the effectiveness of pre-

training, we also include the averaged response map in C5

of being directly trained from scratch with attribute tags but

without pre-training in Fig.3 (a.4). It cannot separate face

regions from background and other body parts well.

Thresholding and Proposing Windows We show that

the responses of C5 in LNet are discriminative enough

to separate faces and background by simply searching a

threshold, such that a window with response larger than

this threshold corresponding to face and otherwise is back-

剩余10页未读，继续阅读

爱吃肉的悟空

粉丝: 99
资源: 12

深度学习框架：复杂环境人脸识别与属性预测

DeepID3_Face_Recognition_with_Very_Deep_Neural_Networks

deeplearning.ai合集1-5包括预训练模型下载（附赠最新python cookbook）

facebook DeepFace人脸识别

人脸识别数据库命名规则

Python人脸识别代

小程序人脸识别给出代码

微信小程序 调用摄像头人脸识别

python调用旷视科技做一个人脸识别的app

人脸识别c#代码 高精度

app inventor调用旷世科技用App Inventor做一个人脸识别的app

通过python是照片动态化通过人脸识别验证的代码

人脸数据集有哪些，分别有何优劣势

ACF attributes in the IDL file need the /app_config switch : [implicit_handle]

deformable onnx

error MIDL2150 : ACF attributes in the IDL file need the /app_config switch : [implicit_handle]

使用 wx.createVKSession获取人脸图片，请给出具体代码

python face++

最新资源

微信小程序调用摄像头人脸识别

人脸识别c#代码高精度