大规模3D人脸识别：从数百万3D扫描中学习

需积分: 50 159 浏览量更新于2024-09-07 收藏 770KB PDF 举报

"学习利用数百万3D扫描进行大规模3D人脸识别" 在当今的面部识别技术领域，深度网络在训练了数百万张面部图像后被认为在人脸识别性能上接近人类水平。然而，开放世界的面部识别仍然是一个挑战。尽管3D人脸识别相比2D具有固有的优势，例如对光照、角度变化的抵抗力更强，但由于缺乏大规模的训练和测试数据集，3D人脸识别尚未充分利用深度学习的最新进展。现有的3D人脸数据集由于样本库小，识别精度已经趋于饱和。与可以从网络上获取的2D照片不同，3D面部扫描无法轻易地从互联网中收集，这成为了开发深度3D人脸识别网络和数据集的一大障碍。鉴于此背景，本文提出了一种方法，用于生成大量标记的3D人脸身份及其多个实例，以供训练，并制定了一种协议，将最具挑战性的现有3D数据集合并在一起，以创建更大的测试集合。该方法可能包括以下关键步骤： 1. 3D面部数据生成：通过模拟或混合现实技术生成大量的3D面部扫描，这些扫描具有不同的表情、姿态和光照条件，以模拟真实世界的变化。这将解决实际3D扫描数据稀缺的问题，同时确保多样性。 2. 身份标签和实例创建：每个3D面部扫描都会被赋予一个唯一的身份标签，并创建多个“实例”，这些实例代表同一人在不同时间、环境下的扫描，模拟实际生活中的变化。 3. 深度学习模型架构：设计或调整深度神经网络结构，使其能够处理3D几何信息，如点云或网格，以及相关的纹理信息。这可能涉及到卷积神经网络（CNN）的变体，例如体素卷积网络（VoxelCNN）或图形卷积网络（GCN）。 4. 大规模训练：使用生成的数据集对深度学习模型进行训练，通过大量的样本使模型学习到丰富的面部特征和变化模式，提高泛化能力。 5. 数据集融合协议：建立一种标准，将来自多个现有3D人脸数据集的最具挑战性的样本整合到一起，形成一个新的、更复杂的数据集，用于验证和提升模型的性能。 6. 评估与优化：在新的大规模数据集上评估模型的性能，通过精度、召回率等指标进行分析，根据结果进行模型的调整和优化。这种方法的目标是推动3D人脸识别技术的进步，使其在处理大规模数据时能够达到甚至超越2D人脸识别的性能。此外，通过创建这样一个大型3D人脸数据库，研究人员可以更深入地研究3D特征对识别性能的影响，探索如何在深度学习框架内更好地利用3D信息，以实现更准确、鲁棒的面部识别系统。

with 2D Scale Invariant Feature Transform (SIFT) to de-

velop multimodal face recognition. However, the keypoint

detection method and features were both sensitive to facial

expressions. For robustness to facial expressions, Mian et

al. [35] proposed a parts based multimodal hybrid method

(MMH) which exploited local and global features in the 2D

and 3D modalities. A key component of their method was

a variant of the ICP [7] algorithm which is computation-

ally expensive due to its iterative nature. Gupta et al. [23]

matched the 3D Euclidean and geodesic distances between

pairs of ﬁducial landmarks to perform 3D face recognition.

Berretti et al. [5] represented a 3D face with multiple mesh-

DOG keypoints and local geometric histogram descriptors

while Drira et al. [18] represented the facial surface by ra-

dial curves emanating from the nosetip.

Model based methods construct a 3D morphable face

model and ﬁt it to each probe face. Face recognition is

performed by matching the model parameters to those in

the gallery. Gilani et al. [13] proposed a keypoint based

dense correspondence model and performed 3D face recog-

nition by matching the parameters of a statistical morphable

model called K3DM. Blanz et al. [8, 11] used the parame-

ters of their 3DMM [10] for face recognition. Passalis et

al. [46] proposed an Annotated Face Model (AFM) based

on an average facial 3D mesh. Later, Kakadiaris et al. [26]

proposed elastic registration using this AFM and performed

3D face recognition by comparing the wavelet coefﬁcients

of the deformed images obtained from morphing. Model

ﬁtting algorithms can be computationally expensive and do

not perform well on large galleries as shown in our results.

Both local and global techniques were tested on indi-

vidual 3D datasets, the largest one being FRGCv2 with a

gallery size of 466 identities. To the best of our knowledge,

none of the conventional methods have performed large-

scale 3D face recognition.

Deep Learning: Akin to progress in other applications of

computer vision, deep learning has given a quantum jump

in 2D face recognition. Three years ago, Facebook AI group

proposed a nine-layer DeepFace model [58] mainly consist-

ing of two convolutional, three locally-connected and two

fully-connected (FC) layers. The network was trained on

4.4M 2D facial images of 4,030 identities and achieved an

accuracy of 97.35% on the benchmark LFW [25] dataset

which is 27% higher than the previous state of the art. This

was followed by Google Inc., a year later, with FaceNet [53]

based on eleven convolutional and three FC layers. The dis-

tinction of this network was its training dataset of 200M

face images of 8M identities and a triplet loss function.

The authors reported face recognition accuracy of 98.87%

on LFW. DeepFace and FaceNet were both trained on pri-

vate datasets which are not available to the broader research

community. Consequently, Parkhi et al. [45] proposed a

method for crawling the web to collect a face database

of 2.6M 2D images from 2,622 identities and presented

the VGG-Face model comprising of 16 convolutional and

three FC layers. Despite training on a smaller dataset, the

authors reported face recognition accuracy of 98.95% on

the LFW dataset. However, recently the MegaFace Chal-

lenges [28, 42] claimed that the existing 2D benchmark

datasets have reached saturation and proposed adding mil-

lions of faces to the galleries of these datasets to match the

real world scenarios. They showed that the face recognition

accuracy of state-of-the-art 2D networks dropped by more

that 20% when just a few thousand distractors were added

to the gallery of public face recognition benchmark datasets.

The take away for the 3D domain is that CNNs on 2D data

perform best when they learn from massive training sets and

are particularly designed for the 2D modality, and yet, their

real performance can be validated only when they are tested

with large gallery sizes.

To the best of our knowledge, only Kim et al. [29] have

presented deep 3D face recognition results. They reported

results on three public datasets after ﬁne tuning the VGG-

Face network [45] on 3D depth images. They used an

augmented dataset of 123,325 depth images to ﬁne-tune

the VGG-Face network and then tested it on the Bospho-

rus [51], BU3DFE [65] and 3D-TEC (twins) [61] datasets

individually. Except for the Bosphorus dataset, their results

do not outperform the state-of-the-art conventional meth-

ods. Moreover, they have not reported results on the chal-

lenging FRGCv2 dataset and their ﬁne-tuned model is not

publicly available.

Data Augmentation: Dou et al. [17] and Richardson et

al. [50] generated thousands of synthetic 3D images for face

reconstruction using BFM [48], AFM [26] and 3DMM [10].

This method generates 3D faces within the linear space of

a speciﬁc statistical face model. The faces generally have a

variation of ±3 standard deviations from the model mean

with highly smooth surfaces. Gilani et al. [9] generated

synthetic images using a similar approach. However, these

images were used to train a 3D landmark identiﬁcation net-

work. Kim et al. [29] ﬁtted the BFM [48] to 577 identities

of FRGCv2 [49] database and induced 25 expressions in

each identity. They also introduced minor pose variations

between ±10

◦

in yaw, pitch and roll for each original scan.

To simulate occlusions, the authors introduced eight ran-

dom occlusion patches to each 2D depth map to increase

the dataset to 123,325 scans. This method only increases

the intra-person variations without augmenting the number

of identities, which in this case remained 577.

3. Proposed Data Generation for Training

We use 3D facial scans of 1,785 individuals (a propri-

ety dataset) who were participants of various studies in our

institution to train our deep network. The number of identi-

ties in this dataset is larger than any 3D dataset but still not

剩余10页未读，继续阅读

RainMen_缪

粉丝: 37

大规模3D人脸识别：从数百万3D扫描中学习

"GraphSAGE论文阅读报告：王云攀，2019.12.6

"高效理学特征选择方法PPT课件：构建更好、更快、更易理解的学习机器

"SAP亚太大数据解决方案实践与演进

Optimization Methods for Large-Scale Machine Learning

Crawling and Tracking Millions of eCommerce Products at Scale

Power Mean SVM for Large Scale Visual Classification

Simulation and Rendering for Millions of Grass Blades.pdf

Multi-digit Number Recognition from Street View Imagery using DCNN

Building_a_Trusted_Ecosystem_for_Millions_of_Apps.pdf

Building_a_Trusted_Ecosystem_for_Millions_of_Apps_A_Threat_Analysis_of_Sideloading (3).pdf

最新资源