提升多模态情感识别：特征融合与深度学习方法

5星 · 超过95%的资源需积分: 10 174 浏览量更新于2024-09-13 收藏 419KB PDF 举报

本文主要探讨了在2012年计算机视觉与模式识别（CVPR）会议上发表的一项研究，题为《从特征集合中识别情感》。该研究由Usman Tariq等人进行，他们旨在提升在日内瓦多模态情绪描绘（GEMEP）面部表情识别与分析数据库上的基础情感识别性能。研究涵盖了两个关键场景：一是基于个体的情感识别，即subject-dependent recognition，即针对特定个体的情感模型；二是跨个体的情感识别，即subject-independent recognition，即在不同个体之间通用的情感模型。研究方法首先从人脸检测开始，通过精确检测面部的关键点，然后生成一系列特征。这些特征包括层次化的高斯化处理（Hierarchical Gaussianization），这是一种数据预处理技术，有助于降低数据中的噪声和提高特征的稳定性；接着利用尺度不变特征变换（Scale-Invariant Feature Transform, SIFT），这是一种能够捕捉到物体在不同尺度和旋转下的不变性特征，对于面部表情的特征提取至关重要；此外，还考虑了一些粗粒度运动特征，这些特征能够反映面部动作的变化，对表达情绪有显著作用。在分类阶段，研究人员采用了支持向量机（Support Vector Machines, SVMs）作为主要的机器学习模型。SVMs因其高效性和在高维空间中的良好泛化能力，在情感识别任务中表现出色。研究者将分类任务细分为个人特异性（person-specific）和通用性（person-independent）两部分，分别针对不同场景设计和优化了分类算法。通过对GEMEP数据库的深度分析和特征融合，该研究不仅提升了情感识别的准确率，还展示了如何通过特征组合和机器学习方法来实现更精确和鲁棒的人类情感理解。这一工作对于后续的人脸表情识别、情感计算以及跨领域的人工智能应用具有重要的理论和实践价值。

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012 1017

Recognizing Emotions From

an Ensemble of Features

Usman Tariq, Student Member, IEEE, Kai-Hsiang Lin, Zhen Li, Xi Zhou, Zhaowen Wang,

Vuong Le, Student Member, IEEE, Thomas S. Huang, Life Fellow, IEEE,XutaoLv,andTonyX.Han

Abstract—This paper details the authors’ efforts to push the

baseline of emotion recognition performance on the Geneva

Multimodal Emotion Portrayals (GEMEP) Facial Expression

Recognition and Analysis database. Both subject-dependent and

subject-independent emotion recognition scenarios are addressed

in this paper. The approach toward solving this problem involves

face detection, followed by key-point identiﬁcation, then feature

generation, and then, ﬁnally, classiﬁcation. An ensemble of fea-

tures consisting of hierarchical Gaussianization, scale-invariant

feature transform, and some coarse motion features have been

used. In the classiﬁcation stage, we used support vector machines.

The classiﬁcation task has been divided into person-speciﬁc and

person-independent emotion recognitions using face recognition

with either manual labels or automatic algorithms. We achieve

100% performance for the person-speciﬁc one, 66% performance

for the person-independent one, and 80% performance for overall

results, in terms of classiﬁcation rate, for emotion recognition with

manual identiﬁcation of subjects.

Index Terms—Biometrics, computer vision, emotion recogni-

tion, machine vision.

I. INTRODUCTION

UTOMATED emotion recognition shall very soon have

its sizeable impact in areas ranging from psychology to

human–computer interaction (HCI) to human–robot interaction

(HRI). For instance, in HRI and HCI, there is an ever-increasing

demand to make the computers and robots behave more human-

like. Some example works that employ emotion recognition

in HCI and HRI are [1] and [2]. Another application is in

computer-aided automated learning [3]. Here, the computer

should ideally be able to identify the cognitive state of the

Manuscript received May 11, 2011 revised November 3, 2011 and

February 15, 2012; accepted March 6, 2012. Date of publication May 3,

2012; date of current version July 13, 2012. This work was supported by a

Google Faculty Research Award. This paper was recommended by Associate

Editor M. Pantic.

U. Tariq, K.-H. Lin, Z. Li, Z. Wang, V. Le, and T. S. Huang are

with the Department of Electrical and Computer Engineering, Coordinated

Science Laboratory, and Beckman Institute for Advanced Science and

Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801

USA (e-mail: utariq2@illinois.edu; klin21@illinois.edu; zhenli3@illinois.edu;

wang308@illinois.edu; vuongle2@illinois.edu; t-huang1@illinois.edu).

X. Zhou is with Chongqing Institute of Green and Intelligent

Technology, Chinese Academy of Sciences, Beijing 100732, China (e-mail:

xizhou2@illinois.edu).

X. Lv and T. X. Han are with the Department of Electrical and Computer

Engineering, University of Missouri, Columbia, MO 65201 USA (e-mail:

xlyp2@mail.mizzou.edu; hantx@missouri.edu).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TSMCB.2012.2194701

student and then accordingly act. For example, if the student

is gloomy, it might tell a joke.

The increasing applications of emotion recognition have

invited a great deal of research in this area in the past decade.

Psychologists and linguists have various opinions about how

the importance of different cues in human affect judgment [3].

However, there are some studies (e.g., [4]) that indicate that

facial expression in the visual channel is the most effective and

important cue that correlates well with the body and the voice.

In this paper, we also use features extracted from the facial

region.

This paper was carried out as part of the 9th IEEE Conference

on Face and Gesture Recognition (FG 2011) Facial Expression

Recognition and Analysis Challenge (FERA 2011). Our results

stood out in the ﬁnal comparison. We stood ﬁrst for person-

speciﬁc results, while we were second in terms of the overall

performance [5]. It is worthwhile to note that the work [6] that

outperformed us in the overall results may face some limitations

in other testing scenarios, as outlined in Section X.

II. B

ACKGROUND WORK

Emotion recognition using visual cues has been receiving a

great deal of attention in the past decade. Most of the existing

approaches do recognition on six universal basic emotions

because of their stability over culture, age, and availability

of such facial expression databases. The choices of features

employed for emotion recognition are classiﬁed in [3] into

two main categories, i.e., geometric features and appearance

features. In this section, we closely follow that taxonomy to

review some of the notable works on the topic.

The geometric features are extracted from the shape or salient

point locations of important facial components such as mouth

and eyes. In [7], 58 landmark points are used to construct an

active shape model (ASM). These are then tracked to do facial

expressions recognition. Pantic and Bartlett [8] introduced a set

of more reﬁned features. They utilize facial characteristic points

around the mouth, the eyes, the eyebrows, the nose, and the

chin as geometric features for emotion recognition. In a more

holistic approach, the active appearance model is utilized to

analyze the characteristics of the facial expressions in [9].

When sequences of images are available, the temporal

dynamics of facial actions can be modeled for expression

recognition. In [10], Valstar et al. propose to characterize speed,

intensity, duration, and the cooccurrence of facial-muscle acti-

vations in video sequences in a parameterized framework. They

then decide whether a behavior is deliberate or spontaneous.

下载后可阅读完整内容，剩余9页未读，立即下载

kjinlin

粉丝: 0

提升多模态情感识别：特征融合与深度学习方法

"竞争中的体系对手：新兴战略竞争的本质

中国医疗生物科技行业：光明与黑暗，审视其关键药物与市场前景

"年终总结PPT模板-完美展示企业年会成果！

Paul Ekman - Unmasking The Face. A guide to recognizing emotions from facial clues.part2.rar

Paul Ekman - Unmasking The Face. A guide to recognizing emotions from facial clues.part1.rar

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

Recognizing Scenes from Novel Viewpoints_从新颖的视角识别场景.pdf

Recognizing Unordered Depth-First Search Trees of an Undirected Graph in Parallel (2000)-计算机科学

Construction of a System for Recognizing Touch of Strings for Guitar

Recognizing-the-place-of-birth-from-the-image-of-the-person-s-face:通过HOG和LBP描述符对人脸进行分类； 使用支持向量机，逻辑回归和随机森林分类器

最新资源

Recognizing-the-place-of-birth-from-the-image-of-the-person-s-face:通过HOG和LBP描述符对人脸进行分类；使用支持向量机，逻辑回归和随机森林分类器