倒置点积注意力路由的胶囊网络

需积分: 9 138 浏览量更新于2024-07-09 收藏 1.58MB PDF 举报

"胶囊网络反向点积注意力路由的论文，发表于ICLR 2020，由Yao-Hung Hubert Tsai等人撰写，旨在改进胶囊网络的路由算法，提高计算机视觉任务的性能。" 在计算机视觉领域，胶囊网络（Capsule Networks）是一种新兴的深度学习模型，它试图解决传统卷积神经网络（CNN）在处理形状、姿态和排列等不变性问题上的不足。胶囊网络通过保持输入数据的结构信息，能够更好地捕获对象的属性和关系。这篇开源论文《带有反向点积注意力路由的胶囊》提出了一个新的路由算法，主要包含以下三个关键改进： 1. 反向点积注意力路由：传统的胶囊网络路由通常基于“动态路由”算法，其中子胶囊向父胶囊发送“投票”，然后父胶囊根据这些投票进行聚类。而本文提出的反向点积注意力机制则相反，父胶囊根据其自身的状态与子胶囊的“投票”之间的相似度来决定接收哪些子胶囊的信息。这种机制可以更有效地聚焦于相关的特征并减少不必要的计算。 2. 层归一化：引入层归一化（Layer Normalization）作为规范化方法，有助于加速训练过程，稳定模型性能，并减少内部协变量漂移问题。层归一化在每个胶囊层内部对所有胶囊的激活进行规范化，确保了同一层内各个胶囊的一致性。 3. 并行迭代路由：传统的胶囊网络通常采用顺序迭代路由，即多轮迭代中逐步调整路由权重。新方法则提出了并行迭代路由，这允许所有子胶囊和父胶囊同时更新其路由权重，提高了计算效率并可能更快地收敛到最优解。实验结果显示，新提出的路由算法在CIFAR-10和CIFAR-100等基准数据集上表现优于先前的胶囊网络方法，并且在参数数量只有ResNet-18的1/4的情况下，性能相当。此外，在识别叠加数字图像的任务上，即使与拥有相同层数和每层神经元数的CNN相比，该胶囊网络模型也表现出色。这项工作不仅提高了胶囊网络的性能，还为理解和改进深度学习模型中的注意力机制提供了新的视角。作者认为，这种反向点积注意力路由可能为未来研究提供了一个有潜力的方向，尤其是在处理复杂视觉场景和理解对象关系时。

Published as a conference paper at ICLR 2020

Higher-level Capsules

Lower-level Capsules

pose

…

Agreement by Dot-Product Attention

multiplication

∑

LayerNorm(⋅)

routing!

coefficients

Pose UpdateRouting Coefficients as Normalized Agreement

agreement

weight

matrix multiplication

reshape

⋅

dot product

⊤

Softmax(⋅)

…

Figure 2: Illustration of the Inverted Dot-Product Attention Routing with the pose admitting matrix structure.

Procedure 1 Inverted Dot-product Attention Routing algorithm returns updated poses of the cap-

sules in layer L + 1 given poses in layer L and L + 1 and weights between layer L and L + 1.

1: procedure INVERTED DOT-PRODUCT ATTENTION ROUTING(P

, P

L+1

, W

)

2: for all capsule i in layer L and capsule j in layer (L + 1): v

← W

· p

 vote

3: for all capsule i in layer L and capsule j in layer (L + 1): a

← p

L+1

· v

 agreement

4: for all capsule i in layer L: r

← exp(a

) /

exp(a

)  routing coefﬁcient

5: for all capsule j in layer (L + 1): p

L+1

←

 pose update

6: for all capsule j in layer (L + 1): p

L+1

← LayerNorm(p

L+1

)  normalization

7: return P

L+1

transformation is done using a learned transformation matrix W

= W

· p

, (1)

where the matrix W

∈ R

L+1

×d

if the pose has a vector structure and W

∈ R

√

L+1

√

(requires d

L+1

= d

) if the pose has a matrix structure. Next, the agreement (a

) is computed by

the dot-product similarity between a pose p

L+1

and a vote v

= p

L+1

· v

. (2)

The pose p

L+1

is obtained from the previous iteration of this procedure, and will be set to 0 initially.

Step 2: Computing Poses: The agreement scores a

are passed through a softmax function to

determine the routing probabilities r

exp(a

)

exp(a

)

, (3)

where r

is an inverted attention score representing how higher-level capsules compete for attention

of lower-level capsules. Using the routing probabilities, we update the pose p

L+1

for capsule j in

layer L + 1 from all capsules in layer L:

L+1

= LayerNorm

. (4)

We adopt Layer Normalization (Ba et al., 2016) as the normalization, which we empirically ﬁnd it

to be able to improve the convergence for routing. The routing algorithm is summarized in Proce-

dure 1 and Figure 2.

4 INFERENCE AND LEARNING

To explain how inference and learning are performed, we use Figure 1 as an example. Note that

the choice of the backbone, the number of capsules layers, the number of capsules per layer, the

design of the classiﬁer may vary for different sets of experiments. We leave the discussions of

conﬁgurations in Sections 5 and 6, and in the Appendix.

剩余14页未读，继续阅读

潜夙

粉丝: 0
资源: 40

倒置点积注意力路由的胶囊网络

40篇ICLR2020计算机视觉开源论文合集.zip

Capsules-Networks-Material:胶囊网络材料

Dynamic Routing Between Capsules - by.Hinton.2017

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules 下载

Capsules-Project

Appleseeds-Capsules

canonical-capsules

capsules

PyPI 官网下载 | taskwarrior-capsules-0.2.4.tar.gz

最新资源