大规模面部表情识别中的不确定性抑制

需积分: 50 11 浏览量更新于2024-09-01 收藏 3.47MB PDF 举报

"Suppressing Uncertainties for Large-Scale Facial Expression Recognition" 是一篇在CVPR2020上获得最佳人脸识别框架奖的论文。该研究主要关注深度学习在大规模人脸表情识别中的应用，通过抑制不确定性来提升识别的准确性和稳定性。在深度学习领域，人脸表情识别是一个重要的子领域，它涉及到计算机视觉（CV）和模式识别技术。CVPR（Computer Vision and Pattern Recognition）是计算机视觉领域的顶级会议，每年都会吸引全球的研究者提交他们的最新研究成果。这篇论文在CVPR2020上获得的认可，表明其提出的解决方案具有显著的创新性和实用性。论文的主要作者包括Kai Wang、Xiaojiang Peng、Jianfei Yang和Shijian Lu等。这些作者都在各自的研究领域有着丰富的经验和深厚的学术积累。例如，Kai Wang来自阿里巴巴集团，可能在实际应用方面有所贡献；而Xiaojiang Peng、Jianfei Yang和Shijian Lu则分别在国立计算机科学与控制研究所和南洋理工大学工作，他们的研究涵盖了多篇出版物和广泛的引用，显示了他们在计算机视觉和人工智能领域的专业性。文章提到的"Suppressing Uncertainties"策略可能指的是在深度学习模型中减少或管理由于数据噪声、模型复杂性以及训练不充分等因素导致的预测不确定性。在大规模人脸表情识别任务中，不确定性管理尤为重要，因为人脸图像可能会受到光照变化、遮挡、姿态变化等多种因素的影响，这都可能导致模型的识别错误。可能的方法包括使用更强大的网络结构来捕获复杂的面部特征，引入对抗性训练以增强模型对未知干扰的鲁棒性，或者利用不确定性估计方法（如蒙特卡洛 Dropout 或贝叶斯神经网络）来量化并减小预测的不确定性。此外，通过结合多模态信息，如声音和头部运动，也可能有助于提高表情识别的准确性。这篇论文的研究成果为解决大规模人脸表情识别中的不确定性问题提供了新的视角和方法，这对于提高实际应用场景中的人脸识别系统性能具有重要意义，特别是在安全监控、人机交互和情感计算等领域。未来的研究可能将沿着这个方向进一步探索，优化模型性能并提升用户体验。

展开

attention importance weighting, ranking regularization, and

noise relabeling. Given a batch of images, a backbone CNN

is ﬁrst used to extract facial features. Then the self-attention

importance weighting module learns a weight for each im-

age to capture the sample importance for loss weighting. It

is expected that uncertain facial images are assigned low im-

portance weights. Further, the ranking regularization mod-

ule ranks these weights in descending order, splits them

into two groups (i.e. high importance weights and low im-

portance weights), and regularizes the two groups by en-

forcing a margin between the average weights of the two

groups. This regularization is implemented with a loss func-

tion, termed as Rank Regularization loss (RR-Loss). The

ranking regularization module ensures that the ﬁrst module

learns meaningful weights to highlight certain samples (e.g.

reliable annotations) and to suppress uncertain samples (e.g.

ambiguous annotations). The last module is a careful rela-

beling module that attempts to relabel these samples from

the bottom group by comparing the maximum predicted

probabilities to the probabilities of given labels. A sample is

assigned to a pseudo label if the maximum prediction prob-

ability is higher than the one of given label with a margin

threshold. In addition, since the main evidence of uncer-

tainties is the incorrect/noisy annotation problem, we col-

lect an extreme noisy FER dataset from the Internet, termed

as WebEmotion, to investigate the effect of SCN with ex-

treme uncertainties.

Overall, our contributions can be summarized as follows,

• We innovatively pose the uncertainty problem in facial

expression recognition, and propose a Self-Cure Net-

work to reduce the impact of uncertainties.

• We elaborately design a rank regularization to super-

vise the SCN to learn meaningful importance weights,

which also provides a reference for the relabeling mod-

ule.

• We extensively validate our SCN on synthetic FER

data and a new real-world uncertain emotion dataset

(WebEmotion) collected from the Internet. Our

SCN also achieves performance 88.14% on RAF-DB,

60.23% on AffectNet, and 89.35% on FERPlus, which

set new records on them.

2. Related Work

2.1. Facial Expression Recognition

Generally, a FER system mainly consists of three stages,

namely face detection, feature extraction, and expression

recognition. In face detection stage, several face detectors

like MTCNN [44] and Dlib [2]) are used to locate faces in

complex scenes. The detected faces can be further aligned

alternatively. For feature extraction, various methods are

designed to capture facial geometry and appearance features

caused by facial expressions. According to the feature type,

they can be grouped into engineered features and learning-

based features. For the engineered features, they can be

further divided into texture-based local features, geometry-

based global features, and hybrid features. The texture-

based features mainly include SIFT [34], HOG [6], His-

tograms of LBP [35], Gabor wavelet coefﬁcients [26], etc.

The geometry-based global features are mainly based on the

landmark points around noses, eyes, and mouths. Combin-

ing two or more of the engineered features refers to the hy-

brid feature extraction, which can further enrich the repre-

sentation. For the learned features, Fasel [12] ﬁnds that a

shallow CNN is robust to face poses and scales. Tang [37]

and Kahou et al. [21] utilize deep CNNs for feature extrac-

tion, and win the FER2013 and Emotiw2013 challenge, re-

spectively. Liu et al. [27] propose a Facial Action Units

based CNN architecture for expression recognition. Re-

cently, both Li et al. [25] and Wang et al. [42] have de-

signed region-based attention networks for pose and occlu-

sion aware FER, where the regions are either cropped from

landmark points or ﬁxed positions.

2.2. Learning with Uncertainties

Uncertainties in the FER task mainly come from am-

biguous facial expressions, low-quality facial images, in-

consistent annotations, and incorrect annotations (i.e. noisy

labels). Particularly, learning with noisy labels is exten-

sively studied in the computer vision community while the

other two aspects are rarely explored. In order to handle

noisy labels, one intuitive idea is to leverage a small set of

clean data that can be used to assess the quality of the labels

during the training process [40, 23, 8], or to estimate the

noise distribution [36], or to train the feature extractors [3].

Li et al. [23] propose a uniﬁed distillation framework using

‘side’ information from a small clean dataset and label re-

lations in knowledge graph, to ‘hedge the risk’ of learning

from noisy labels. Veit et al.[41] use a multi-task network

that jointly learns to clean noisy annotations and to clas-

sify images. Azadi et al.[3] select reliable images by an

auxiliary image regularization for deep CNNs with noisy

labels. Other methods do not need a small clean dataset

but they may assume extra constrains or distributions on

the noisy samples [31], such as a speciﬁc loss for randomly

ﬂipped labels [33], regularizing the deep networks on cor-

rupted labels by a MentorNet [20], and other approaches

that model the noise with a softmax layer by connecting

the latent correct labels to the noisy ones [13, 43]. For the

FER task, Zeng et al. [43] ﬁrst consider the inconsistent

annotation problem among different FER datasets, and pro-

pose to leverage these uncertainties to improve FER. In con-

trast, our work focuses on suppressing these uncertainties

to learn better facial expression features.

下载后可阅读完整内容，剩余10页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

qq_39003110

粉丝: 0

大规模面部表情识别中的不确定性抑制

CVPR 2020论文集第二部分：计算机视觉与模式识别最新进展

Suppressing Phosphatidylcholine-specific Phospholipase C and elevating ROS level, NADPH oxidase activity and Rb level induced neuronal differentiation in mesenchymal stem cells

A Feedback Control System for Suppressing Crane Oscillations

Nonlinear switching control for suppressing the spread of avian influenza

Sample weighting: aninherent approach for outlier suppressing discriminant analysis

Suppressing Multipath Interference by Using Smart Antenna for Passive UHF RFID System

Printed Arc Coupler in a Circular Cylindrical Measurement Cavity for Suppressing Degenerate Modes

Pairwise subcarriers weighting for suppressing out‐of‐band radiation of OFDM

Suppressing the disturbance in the transmission spectrum of Glan-Thompson-type prism polarizers

Clutter-suppressing performance estimation methods of active sonar waveform based on reverberation statistical models in littoral environment

最新资源