等变神经网络中的可控偏微分算子与物理理论融合

版权申诉

201 浏览量更新于2024-07-06 收藏 1.32MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"这篇文档是关于等变神经网络中可控制偏微分算子(Steerable Partial Differential Operators)的研究，该研究旨在将物理学中的偏微分方程理论应用于深度学习，特别是等变神经网络领域。作者Erik Jenner和Maurice Weiler探讨了如何在深度学习中构建与物理学中类似的 equivariant maps，这些映射通常由卷积核定义，而在物理学中则由偏微分算子(PDOs)来表示。通过发展等变PDO的理论，可以进一步拉近深度学习和物理学之间的联系，促进两领域间的创新交流。文档中提出了一个G-可转向性约束(G-steerability constraint)，这个约束完全描述了特征向量场之间的一个PDO何时具有等变性，对于任意对称群G都适用。作者随后为几个重要的群解决了这个约束，并将这些解决方案作为卷积层的等变替代品，进行了性能基准测试。最后，他们建立了一个基于Schwartz分布的等变映射框架，这为构建更复杂、更灵活的等变神经网络模型提供了新方法。等变神经网络是深度学习的一个分支，它们能够保持对输入数据变换的不变性，如旋转、平移或镜像，这对于处理如图像、图形等具有固有对称性的数据尤其重要。传统的卷积神经网络(CNNs)虽然在图像处理中表现出色，但其等变性局限于平移不变性。通过引入偏微分算子，等变神经网络可以捕捉更复杂的变换，如旋转和缩放，从而提高模型的泛化能力。 PDOs的等变性意味着它们在应用到经过特定群操作（如旋转、反射等）的数据时，会产生相同类型的输出变换。这使得模型能够在不增加额外参数的情况下处理更多的几何结构。在本文档中，提出的G-可转向性约束确保了PDOs的这种性质，使得神经网络可以学习到更高级别的对称性。这份工作为深度学习领域提供了一种新的工具，它结合了物理学中的偏微分方程思想，有望推动等变神经网络的发展，提升模型在处理几何变换数据时的性能。"

资源详情

资源推荐

Table 1: MNIST-rot results. Test errors

standard deviations

are averaged over six runs. Vanilla CNN is a solely translation

equivariant model (

G = {e}

) with the same general architecture.

See main text for details on the models.

Represen-

tation

Method Stencil Error [%] Params

–

Vanilla

CNN

3 × 3 2.001 ± 0.030

1.1M

5 × 5 1.959 ± 0.055

regular

(our

basis)

Kernels

3 × 3 0.741 ± 0.036 837K

5 × 5 0.683 ± 0.021 1.1M

3 × 3 1.196 ± 0.062 837K

5 × 5 1.54 ± 0.32 941K

RBF-FD

3 × 3 1.313 ± 0.065 837K

5 × 5 1.475 ± 0.020 941K

Gauss

3 × 3 0.795 ± 0.030 837K

5 × 5 0.750 ± 0.017 941K

regular

(PDO-

eConv)

5 × 5 1.98 ± 0.11

982K

Gauss 5 × 5 0.831 ± 0.039

quotient

(our

basis)

Kernels

3 × 3 0.717 ± 0.026 877K

5 × 5 0.670 ± 0.011 1.1M

3 × 3 1.143 ± 0.063 877K

5 × 5 1.347 ± 0.026 951K

RBF-FD

3 × 3 1.303 ± 0.077 877K

5 × 5 1.422 ± 0.040 951K

Gauss

3 × 3 0.825 ± 0.053 877K

5 × 5 0.744 ± 0.040 951K

In addition to discretization, the inﬁnite

basis of steerable PDOs or kernels needs

to be restricted to a ﬁnite subspace. For

kernels, we use the bandlimiting ﬁlters

by Weiler & Cesa (2019). For PDOs,

we restrict the total derivative order to

two for

3 × 3

stencils and to three for

5 × 5

stencils (except for PDO-eConvs,

where we replicate the original basis that

restricts the maximum order of partial

derivatives).

An implementation of steer-

able PDOs and these discretiza-

tion methods can be found at

https://github.com/ejnnr/

steerable_pdos

. The code to run

our speciﬁc experiments is available at

https://github.com/ejnnr/

steerable_pdo_experiments.

Finally, steerable PDOs and steerable

kernels only replace the convolutional

layers in a classical CNN. To achieve a

fully equivariant network, all the other

layers, such as nonlinearities or Batch-

norm also need to be equivariant. Weiler

& Cesa (2019) discuss in details how

this can be achieved for various types

of layers and for different group repre-

sentations. In our experiments, we use

exactly the same implementation they

do and only replace the kernel layers with steerable PDOs. Care also needs to be taken with biases in

the PDO layers in order to achieve equivariance. Here, we again follow (Weiler & Cesa, 2019) by

adding a bias only to the trivial irreducible representations that make up ρ

out

Rotated MNIST

We ﬁrst benchmark steerable PDOs on rotated MNIST (Larochelle et al., 2007),

which consists of MNIST images that have been rotated by different angles, with 12k train and

50k test images. Our results can be found in Table 1. The models with

5 × 5

stencils use an

architecture that Weiler & Cesa (2019) used for steerable CNNs, with six

-equivariant layers

followed by two fully connected layers. The ﬁrst column gives the representation under which

the six equivariant layers transform (see Appendix B for their deﬁnitions). PDO-eConvs implicitly

use regular representations (see Appendix J), but with a slightly different basis than the one we

present, so we test both bases. We also tested models that are

-equivariant in their ﬁrst layers and

-equivariant in their last one but did not ﬁnd any improvements, see Appendix K. For the models

with

3 × 3

stencils, we use eight instead of six

-equivariant layers, in order to compensate for the

smaller receptive ﬁeld and keep the parameter count comparable. The remaining differences between

kernel and PDO parameter counts come from the fact that the basis restrictions necessarily work

slightly differently (via bandlimiting ﬁlters or derivative order restriction respectively). All models

were trained with 30 epochs and hyperparameters based on those by Weiler & Cesa (2019), though

we changed the learning rate schedule and regularization slightly because this improved performance

for all models, including kernel-based ones. The training data is augmented with random rotations.

Precise descriptions of the architecture and hyperparameters can be found in Appendix L.

STL-10

The rotated MNIST dataset has global rotational symmetry by design, so it is unsurprising

that equivariant models perform well. But interestingly, rotation equivariance can also help for

natural images without global rotational symmetry (Weiler & Cesa, 2019; Shen et al., 2020). We

therefore benchmark steerable PDOs on STL-10 (Coates et al., 2011), where we only use the labeled

portion of 5000 training images. The results are shown in Table 2. The model architecture and

hyperparameters are exactly the same as in (Weiler & Cesa, 2019), namely a Wide-ResNet-16-8

trained for 1000 epochs with random crops, horizontal ﬂips and Cutout (DeVries & Taylor, 2017) as

data augmentation. The group column describes the equivariance group in each of the three residual

blocks. For example,

means that the ﬁrst block is equivariant under reﬂections and 8

rotations, the second under 4 rotations and the last one only under reﬂections. All layers use regular

representations. The

-equivariant layers use

5 × 5

ﬁlters to improve equivariance, whereas the

other layers use 3 × 3 ﬁlters.

Table 2: STL-10 results, again over six runs. All models

except the vanilla CNN use regular representations, see

main text for details.

Method Groups Error [%] Params

Vanilla

CNN

– 12.7 ± 0.2 11M

Kernels

10.7 ± 0.6

4.2M

10.2 ± 0.4

12.1 ± 0.6

3.2M

12.1 ± 0.7

RBF-FD

14.3 ± 0.4

Gauss

11.2 ± 0.3

10.6 ± 0.8

Discussion

While all equivariant models im-

prove signiﬁcantly over the non-equivariant

CNN, the method of discretization plays an im-

portant role for PDOs. The reason that FD and

RBF-FD underperform kernels is that they don’t

make full use of the stencil, since PDOs are

inherently local operators. When a

5 × 5

sten-

cil is used, the outermost entries are all very

small compared to the inner ones, and even in

3 × 3

kernels, the four corners tend to be closer

to zero (see Appendix K for images of stencils

to illustrate this). Gaussian discretization per-

forms signiﬁcantly better and almost as well as

kernels because its smoothing effect alleviates

these issues. This ﬁts the observation that ker-

nels and Gaussian methods proﬁt from using

5 × 5

kernels, whereas these do not help for FD

and RBF-FD (and in fact decrease performance because of the smaller number of layers). We also

observe a very small but rather consistent advantage of quotient representations over regular ones,

demonstrating the practical usefulness of non-regular representations.

5 RELATED WORK

Equivariant networks

Equivariant neural networks have gained a lot of popularity in the last few

years, starting with group convolutional neural networks (Cohen & Welling, 2016; Hoogeboom et al.,

2018; Weiler et al., 2018b). These networks apply a ﬁlter in all

-transformed poses, where

is the

group under which equivariance is desired, such as

H = R

o G

. Classical CNNs are the special

case where

is trivial and ﬁlters are thus only translated. Because of their additional transformations

of ﬁlters, feature maps for group convolutional networks are deﬁned on

rather than on the original

input space

. The learned ﬁlters are unrestricted and the equivariance is guaranteed by the form of

the group convolution itself.

A somewhat different approach is taken by steerable CNNs (Cohen & Welling, 2017; Weiler et al.,

2018a; Weiler & Cesa, 2019; Brandstetter et al., 2021). They represent a single feature as a map

from the base space, such as

, to a ﬁber

that is equipped with a representation

of the

point group

. In contrast to group convolutional networks, they thus use the same domain

as classical CNNs for feature maps. Instead, they extend the codomain from

. If regular

representations are used for

, steerable CNNs becomes equivalent to group convolutional networks

with

H = R

o G

as the symmetry group. The convolution operation used by steerable CNNs

is simply the classical convolution, so to achieve equivariance, the ﬁlters that are used need to be

restricted by the G-steerability constraint.

Differential operators and deep learning

Analogies between partial differential operators and

convolutions have been noted and exploited in several previous works (Ruthotto & Haber, 2020; Shen

et al., 2020; Long et al., 2018). The common thread throughout these is that discretizing PDOs on

regular grids naturally leads to convolutions with certain ﬁlters, as long as the PDO coefﬁcients are

spatially constant.

A different way in which differential operators have appeared in deep learning are via Neural

ODEs (Chen et al., 2018) and related ideas (E, 2017; Ruthotto & Haber, 2020; Lu et al., 2018). There,

a residual neural network is interpreted as approximately solving a differential equation, where depth

剩余43页未读，继续阅读

易小侠

粉丝: 6547
资源: 9万+

等变神经网络中的可控偏微分算子与物理理论融合

steerpyr.rar_Steerable_pyramid_pyramid_steerable_steerable wa

steerable-pyramid-algorithm-.rar_ Steerable pyramid_Steerable_py

Steerable pyramids matlab

Log Gabor相关资料，列举出来

2024-2030全球与中国硅胶婴儿用品市场现状及未来发展趋势 Sample-Li Jinpan.pdf

用于非线性模型预测控制 (NMPC) 的并行优化工具包.7z

Flow-Guided-Feature-Aggregation研究基于视频的目标检测FGFA框架.zip

习题集计算机原理习题集

基于Spring Boot框架的尚融宝网络借贷平台.zip

基于改进YOLOv5检测脑瘤

R 语言数据分析案例：探索零售数据集并进行销售分析.docx

Avatar_Utils-1.3.9-py3-none-any.whl.zip

使用opencv的dnn模块做yolov4目标检测.zip

健身国际俱乐部系统.zip

基于springboot的医院药品管理系统设计与实现.docx

全球与中国身份分析市场现状及未来发展趋势（2024版）.docx

伪装目标检测.zip

java毕业设计:基于SSM的校园订餐小程序PPT.ppt

电力人工智能数据竞赛——液压吊车目标检测赛道.zip

zmn17748964230.db-journal

最新资源