深度学习视频人脸识别：注意力强化学习方法

视频人脸识别

深度学习

需积分: 15 76 浏览量更新于2024-09-09 收藏 457KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"本文提出了一种注意力感知的深度强化学习（ADRL）方法，用于视频人脸识别。该方法旨在丢弃误导性和混淆的帧，找到面部视频中的注意力焦点，从而实现人员识别。通过将寻找视频注意力的过程建模为马尔可夫决策过程，并在没有额外标签的情况下，利用深度强化学习框架训练注意力模型。与现有的注意力模型不同，我们的方法同时从图像空间和特征空间获取信息，以更好地利用在特征学习过程中被忽略的面部信息。此外，我们的方法具有注意力感知能力，它会寻找不同的关注点以提高识别准确性。" 在深度学习视频人脸识别领域，ADRL（Attention-aware Deep Reinforcement Learning）方法是一项创新性技术。它解决了传统视频人脸识别中因误导性或混淆帧导致的识别准确率下降问题。在视频中，某些帧可能包含噪声或者不清晰的面部图像，这些帧可能会对人脸识别算法造成干扰。ADRL通过智能地筛选出包含关键信息的帧，提高识别的可靠性和效率。 ADRL的核心是将视频注意力寻找过程转化为一个马尔可夫决策过程（Markov Decision Process, MDP）。在MDP中，每个时间步代表视频的一个帧，状态表示当前帧的信息，而动作则是在一系列帧中选择哪些作为关注点。通过深度强化学习，模型可以学习到如何根据当前状态选择最佳行动，即选择最能帮助识别的帧。这一过程无需额外的标注数据，模型自我迭代和优化，逐渐提升在识别任务中的表现。在设计上，ADRL模型结合了图像空间和特征空间的信息。通常，深度学习模型在特征提取过程中可能会丢失一些关键的面部细节。为了弥补这一点，ADRL不仅考虑原始图像的视觉信息，还考虑经过深层网络处理后的特征信息，这使得模型能够更全面地理解面部特征，提高识别性能。此外，ADRL方法具有“注意力感知”特性，这意味着它能够动态调整其关注点，适应不同视频中的变化和挑战。例如，当面部表情、光照条件或拍摄角度发生变化时，模型可以自动调整其注意力策略，选择最具辨别力的特征进行识别，提高了视频人脸识别的鲁棒性。 ADRL为视频人脸识别提供了一种新颖且高效的方法，通过强化学习和注意力机制，有效地处理了视频中的噪声和不确定性，提高了人脸识别的准确性和实用性。这种方法对于监控、安全和身份验证等应用场景具有重大意义，可以广泛应用于实际的人脸识别系统中。

资源详情

资源推荐

Attention-aware Deep Reinforcement Learning for Video Face Recognition

Yongming Rao

1,2,3

, Jiwen Lu

1,2,3∗

, Jie Zhou

1,2,3

Department of Automation, Tsinghua University, Beijing, China

State Key Lab of Intelligent Technologies and Systems, Beijing, China

Tsinghua National Laboratory for Information Science and Technology (TNList), Beijing, China

raoyongming95@gmail.com; {lujiwen,jzhou}@tsinghua.edu.cn

Abstract

In this paper, we propose an attention-aware deep rein-

forcement learning (ADRL) method for video face recogni-

tion, which aims to discard the misleading and confounding

frames and ﬁnd the focuses of attentions in face videos for

person recognition. We formulate the process of ﬁnding the

attentions of videos as a Markov decision process and train

the attention model through a deep reinforcement learning

framework without using extra labels. Unlike existing at-

tention models, our method takes information from both the

image space and the feature space as the input to make bet-

ter use of face information that is discarded in the feature

learning process. Besides, our approach is attention-aware,

which seeks different attentions of videos for the recognition

of different pairs of videos. Our approach achieves very

competitive video face recognition performance on three

widely used video face datasets.

1. Introduction

Video face recognition has attracted great attention in

computer vision over the past few years [4, 7, 8, 15, 24, 31,

32, 40, 41, 43]. There are many practical applications for

video face recognition such as access control, video search

and visual surveillance. Compared to still face recognition,

videos can capture human faces from multiple views, which

provide more useful information of a single face. Howev-

er, video faces usually suffer from uncontrolled variations

of poses, illuminations and etc., which leads to large intra-

class distances. Hence, it is desirable to design a model to

integrate information across frames and reduce intra-class

distances for effective and robust video face recognition.

There have been a variety of studies on how to effective-

ly integrate information across frames for video face rep-

resentation [6, 18, 21, 28, 43]. These methods exploit video

information from all frames, which is usually considered

∗

Corresponding author.

澷濂濂

激濣濗濕濠澔濆濙濗濩濦濦濙濢濨

澺濦濕濡濙澔澹濪濕濠濩濕濨濝濣濢澔濂濙濨濫濣濦濟澔

澷濂濂

瀖

激濣濗濕濠澔濈濙濡濤濣濦濕濠澔濄濣濣濠濝濢濛澔

瀖

澷濂濂

激濣濗濕濠澔濆濙濗濩濦濦濙濢濨

澷濂濂

瀖

激濣濗濕濠澔濈濙濡濤濣濦濕濠澔濄濣濣濠濝濢濛澔

瀖

澵濨濨濙濢濨濝濣濢

瀖

濊濙濦濝濚濝濗濕濨濝濣濢

澽濡濕濛濙澔濧濤濕濗濙

濇濤濕濨濝濕濠澔澔

濦濙濤濦濙濧濙濢濨濕濨濝濣濢

濠濙濕濦濢濝濢濛

濈濙濡濤濣濦濕濠澔

濦濙濤濦濙濧濙濢濨濕濨濝濣濢

濠濙濕濦濢濝濢濛

澵濨濨濙濢濨濝濣濢澡濕濫濕濦濙澔

濦濙濝濢濚濣濦濗濙濡濙濢濨

濠濙濕濦濢濝濢濛

Figure 1. Flow-chart of our proposed method for video face recog-

nition. Our approach takes a pair of face videos as the input and

produces the temporal-spatial representations for each frame by

using multiple stacked modules, including a convolutional neural

network (CNN), a recurrent layer and a pooling layer with local-

ity constraints, respectively. Then, a hard attention model with

a frame evaluation network is trained by the proposed deep rein-

forcement learning method, which ﬁnds the attentions of the video

pair for face veriﬁcation.

as equal importance. However, some features are mislead-

ing and confounding so that low quality frames may har-

m the performance of recognition. To address this, Yang

et al. [43] proposed an attention-based method to ﬁnd the

weights of features by using the information from features

themselves. However, the information of image quality is

reduced in the feature learning process [40], where infor-

mation from the feature space is not reliable enough to ﬁnd

the most important parts (precise focuses of attention) in

videos.

In this work, we propose a new approach by introducing

the Markov decision process (MDP) [3] to remove these

misleading and confounding frames step by step with the

2017 IEEE International Conference on Computer Vision

DOI 10.1109/ICCV.2017.424

3951

下载后可阅读完整内容，剩余9页未读，立即下载

凉凉涂涂

粉丝: 9
资源: 153

深度学习视频人脸识别：注意力强化学习方法

人脸识别源代码及模型

深度学习下视频人脸识别

视频中的人脸识别（包含爬虫一整套代码）

深度学习在人脸识别中的应用

用深度学习进行人脸识别

基于深度学习人脸识别方法有哪些，优缺点是什么

Python实现一个非深度学习的人脸识别算法的代码，对一个视频进行处理，请注意，需要识别，而不是单纯检测人脸

Python实现一个非深度学习的人脸识别算法的代码，对一个视频进行处理

matlab深度神经网络人脸识别

基于深度学习的人脸表情识别方法研究

深度学习的人脸识别的数据集

基于cnn的人脸识别_人脸识别技术：从传统方法到深度学习

opencv视频人脸识别

深度学习在人脸面部表情识别上的应用

人脸识别技术国内外研究现状

qt+c++ 采用深度学习方法调用摄像头进行人脸识别

MTCNN怎么应用于视频人脸识别

人脸识别系统学习笔记

人脸识别国内外研究最新现状

最新资源