混合高阶注意力网络在行人重识别中的应用

需积分: 33 117 浏览量更新于2024-09-04 收藏 760KB PDF 举报

" Mixed High-Order Attention Network for Person Re-Identification - CVPR2020" 在计算机视觉领域，特别是行人重识别（Person Re-Identification, ReID）任务中，注意力机制（Attention）已经成为一种重要的技术，因为它能有效地引导模型关注输入图像中最有信息的部分。然而，当前的前沿工作主要集中在第一级或粗粒度的注意力设计上，如空间注意力和通道注意力，而对更高阶的注意力机制探索不足。这篇论文由北京邮电大学的Binghui Chen、Weihong Deng和Jiani Hu共同撰写，他们提出了一种名为高阶注意力模块（High-Order Attention, HOA）的新方法，用于捕捉和利用注意力机制中的复杂和高阶统计信息。通过这种方法，能够更精确地捕获行人之间的微妙差异，并生成具有区分性的注意力提案。高阶注意力模块考虑了多维度和多层次的信息交互，从而增强对关键特征的识别能力。作者进一步将行人重识别视为零样本学习问题（zero-shot learning problem），引入混合高阶注意力网络（Mixed High-Order Attention Network, MHN）。MHN旨在增强注意力知识的辨别力和多样性，它结合了不同层次的高阶注意力，使得模型能够在没有直接训练样本的情况下，更好地识别和理解不同个体的独特特征。这在行人重识别中尤其重要，因为实际场景中可能遇到未在训练集中出现的行人。论文中提到的MHN通过混合不同类型的高阶注意力，不仅增强了模型对局部细节的敏感性，还提升了全局上下文的理解。这种混合方式有助于克服单一注意力机制的局限性，提高识别的准确性和鲁棒性。此外，MHN的零样本学习视角为解决跨域行人重识别问题提供了新的思考，即在没有目标域数据的情况下，模型仍能有效地进行行人身份识别。这篇CVPR2020论文为行人重识别引入了创新的高阶注意力机制，通过混合不同层次的注意力，提高了模型对行人特征的提取能力和识别性能。这一研究对于提升ReID系统的准确率和适应性具有重要价值，为未来相关领域的研究提供了新的思路和方法。

interactions. Gao et al. [15] proposed to approximate the

second-order statistics via Tensor Sketch [35]. Yin et al.

[12] aggregated higher-order statistics by iteratively apply-

ing the Tensor Sketch compression to the features. Cai et al.

[2] used high-order pooling to aggregate hierarchical convo-

lutional responses. Moreover, the bilinear pooling and high-

order pooling methods are also applied in Visual-Question-

Answering task, such as [14, 22, 56, 57]. However, differ-

ent from these above methods which mainly focus on using

high-order statistics on top of feature pooling, resulting in

high-dimensional feature representations that are not suit-

able for efﬁcient/fast pedestrian search, we instead intend

to enhance the feature discrimination by attention learning.

We model high-order attention mechanism to capture the

high-order and subtle differences among pedestrians, and to

produce the discriminative attention proposals.

Zero-Shot Learning: In ZSL, the model is required to

learn from the seen classes and then to be capable of utiliz-

ing the learned knowledge to distinguish the unseen classes.

It has been studied in image classiﬁcation [28, 4], video

recognition [13] and image retrieval/clustering [5]. Interest-

ingly, person ReID matches the setting of ZSL well where

training identities have no intersection with testing identi-

ties, but most the existing ReID works ignore the problem

of ZSL. To this end, we propose Mixed High-Order Atten-

tion Network (MHN) to explicitly depress the problem of

‘biased learning behavior of deep model‘ [5, 6] caused by

ZSL, allowing the learning of all-sided attention informa-

tion which might be useful for unseen identities, preventing

the learning of biased attention information that only bene-

ﬁts to the seen identities.

3. Proposed Approach

In this section, we will ﬁrst provide the formulation of

the general attention mechanism in Sec. 3.1, then detail

the proposed High-Order Attention (HOA) module in Sec.

3.2, ﬁnally show the overall framework of our Mixed High-

Order Attention Network (MHN) in Sec. 3.3.

3.1. Problem Formulation

Attention acts as a tool to bias the allocation of available

resources towards the most informative parts of an input. In

convolutional neural network (CNN), it is commonly used

to reweight the convolutional response maps so as to high-

light the important parts and suppress the uninformative

ones, such as spatial attention [25, 27] and channel atten-

tion [19, 27]. We extend these two attention methods to

a general case. Speciﬁcally, for a convolutional activation

output, a 3D tensor X , encoded by CNN and coming from

the given input image. We have X ∈ R

C×H×W

, where

C, H, W indicate the number of channel, height and width,

resp. As aforementioned, the goal of attention is to reweight

the convolutional output, we thus formulate this process as:

Y = A(X )  X (1)

where A(X ) ∈ R

C×H×W

is the attention proposal output

by a certain attention module,  is the Hadamard Product

(element-wise product). As A(X ) serves as a reweighting

term, the value of each element of A(X ) should be in the

interval [0, 1]. Based on the above general formulation of at-

tention, A(X ) can take many different forms. For example,

if A(X ) = rep[M]|

where M ∈ R

H×W

is a spatial mask

and rep[M ]|

means replicate this spatial mask M along

channel dimension by C times, Eq. 1 thus is the implemen-

tation of spatial attention. And if A(X ) = rep[V ]|

H,W

where V ∈ R

is a scale vector and rep[V ]|

H,W

means

replicate this scale vector along height and width dimen-

sions by H and W times resp, Eq. 1 thus is the implemen-

tation of channel attention.

However, in spatial attention or channel attention, A(X )

is coarse and unable to capture the high-order and complex

interactions among parts, resulting in less discriminative at-

tention proposals and failing in capturing the subtle differ-

ences among pedestrians. To this end, we dedicate to mod-

eling A(X ) with high-order statistics.

3.2. High-Order Attention Module

To model the complex and high-order interactions within

attention, we ﬁrst deﬁne a linear polynomial predictor on

top of the high-order statistics of x, where x ∈ R

denotes

a local descriptor at a speciﬁc spatial location of X :

a(x) =

r=1

, ⊗

xi (2)

where h·, ·i indicates the inner product of two same-sized

tensors, R is the number of order, ⊗

x is the r-th order

outer-product of x that comprises all the degree-r mono-

mials in x, and w

is the r-th order tensor to be learned that

contains the weights of degree-r variable combinations in x.

Considering w

with large r will introduce excessive

parameters and incur the problem of overﬁtting, we sup-

pose that when r > 1, w

can be approximated by D

rank-1 tensors by Tensor Decomposition [23], i.e. w

d=1

r,d

⊗ · · · ⊗ u

r,d

when r > 1, where u

r,d

∈

, . . . , u

r,d

∈ R

are vectors, ⊗ is the outer-product, α

r,d

is the weight for d-th rank-1 tensor. Then according to the

tensor algebra, Eq. 2 can be reformulated as:

a(x) = hw

, xi +

r=2

d=1

r,d

⊗ · · · ⊗ u

r,d

, ⊗

= hw

, xi +

r=2

d=1

r,d

s=1

r,d

, xi

= hw

, xi +

r=2

hα

, z

i (3)

剩余10页未读，继续阅读

佑林杉

粉丝: 10
资源: 28

混合高阶注意力网络在行人重识别中的应用

理解git reset的--soft、--mixed、--hard参数

双向I2C总线和SMBus电压级别转换器TI-PCA9306详解

"e-C8051F36x-混合信号ISP闪存MCU家族资料介绍

The mixed-type reverse-order laws for weighted generalized inverses of a triple matrix product

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Envir

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Matlab-and-VC-Mixed-Programming-.rar_mixed

An mixed analytical-numerical modeling for plate under hot rolling---Part Ⅰ: Temperature model

摘要音频：MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human

mems-based-conductivitytemperaturedepth-ctd-sensor-for-hars.pdf

最新资源