基于控制变量的图卷积网络随机训练算法研究

需积分: 5 170 浏览量更新于2024-06-21 收藏 1.22MB PDF 举报

"Graph Convolutional Networks with Variance Reduction" 在本文中，我们将讨论图卷积神经网络（Graph Convolutional Networks，GCN）及其在 graph-structured 数据中的应用。GCN 是一种强大的深度神经网络，能够学习图结构数据的表示。然而，GCN 的计算复杂度随着层数的增加而指数增长，使得它在大规模图数据上的应用变得非常困难。为了解决这个问题，研究人员提出了多种方法来减少 GCN 的计算复杂度，例如 subsampling 邻居节点。但是，这些方法并不能保证收敛性，且每个节点的感受野大小仍然在数百个节点的数量级别上。在本文中，我们提出了基于控制变量的算法，以便在任意小的邻居节点数量下进行采样。同时，我们也证明了我们的算法能够收敛到 GCN 的局部最优解。实验结果表明我们的算法能够以类似于 exact 算法的收敛速度，而只需要使用每个节点的两个邻居节点。此外，我们的算法在大型 Reddit 数据集上的运行时间仅为之前邻居采样算法的七分之一。 1. 图卷积神经网络（GCN）图卷积神经网络（GCN）是一种深度神经网络，能够学习图结构数据的表示。GCN 的核心思想是将图结构数据转换为欧几里德空间中的向量，接着使用卷积神经网络来学习图结构数据的表示。GCN 的优点是能够学习到图结构数据的深层次表示，从而能够提高图结构数据的分类、聚类和回归任务的性能。 2. GCN 的计算复杂度 GCN 的计算复杂度随着层数的增加而指数增长，因为每个节点的表示需要递归地从其邻居节点计算。这种计算复杂度的增长使得 GCN 在大规模图数据上的应用变得非常困难。 3. 基于控制变量的算法为了解决 GCN 的计算复杂度问题，我们提出了基于控制变量的算法。该算法允许在任意小的邻居节点数量下进行采样，从而减少 GCN 的计算复杂度。同时，我们也证明了我们的算法能够收敛到 GCN 的局部最优解。 4. 实验结果我们在大型 Reddit 数据集上进行了实验，结果表明我们的算法能够以类似于 exact 算法的收敛速度，而只需要使用每个节点的两个邻居节点。此外，我们的算法在大型 Reddit 数据集上的运行时间仅为之前邻居采样算法的七分之一。本文提出了基于控制变量的算法来减少 GCN 的计算复杂度。我们的算法能够收敛到 GCN 的局部最优解，并且能够以类似于 exact 算法的收敛速度。实验结果表明我们的算法具有良好的收敛性和高效性，将有助于 GCN 在大规模图数据上的应用。

Stochastic Training of Graph Convolutional Networks with Variance Reduction

Esti. VNS VD

Exact 0 S

(l)

NS (P

(l)

− P

(l)

)

n(u)

(l)

CV (P

∆µ

(l)

− P

∆µ

(l)

)



3 +

n(u)

(l)



(l)

CVD (P

∆µ

(l)

− P

∆µ

(l)

)

(l)

Table 2.

Variance of different estimators. To save space we omit

(l)

∈n(u)

before all the VNS terms.

5.1. Control Variate for Dropout

With dropout,

∆h

(l)

= h

(l)

−

(l)

is not necessarily small

even if

(l)

and

(l)

have the same distribution. We develop

another stochastic approximation algorithm, control variate

for dropout (CVD), that works well with dropout.

Our method is based on the weight scaling procedure (Sri-

vastava et al., 2014) to approximately compute the mean

(l)

:= E

(l)

. That is, along with the dropout model,

we can run a copy of the model without dropout to obtain the

mean

(l)

, as illustrated in Fig. 1(d). We obtain a stochastic

approximation by separating the mean and variance

(P H

(l)

)

v∈n(u)

(

(l)

+ ∆µ

(l)

+ ¯µ

(l)

) ≈ CVD

(l)

√

v∈

(l)

+ R

v∈

∆µ

(l)

v∈n(u)

¯µ

(l)

where

n =

(l)

(u)

R = n(u)/D

(l)

for short,

(l)

−µ

(l)

¯µ

(l)

is the historical mean activation, obtained by

storing

(l)

instead of

(l)

, and

∆µ

(l)

= µ

(l)

−¯µ

(l)

. We sep-

arate

(l)

as three terms, the latter two terms on

(l)

do not

have the randomness from dropout, and

(l)

are treated as if

(l)

for the CV estimator. The ﬁrst term has zero mean w.r.t.

dropout, i.e.,

(l)

= 0

. We have

(l)

(u)

CVD

(l)

0 +

v∈n(u)

(∆µ

(l)

+ ¯µ

(l)

) = E

(P H

(l)

)

, i.e., the

estimator is unbiased, and we shall see that the estimator

eventually has the correct variance if

(l)

’s are uncorrelated

in Sec. 5.2.

5.2. Variance Analysis

We analyze the variance under the assumption that the

node activations are uncorrelated, i.e.,

Cov

(l)

, h

(l)

0, ∀v

6= v

. We report the correlation between nodes em-

pirically in Appendix G. To facilitate the analysis of the vari-

ance, we introduce two propositions proven in Appendix A .

The ﬁrst helps the derivation of the dropout variance; and

the second implies that we can treat the variance introduced

by neighbor sampling and by dropout separately.

Proposition 2.

(l)

(u)

contains

(l)

samples from the

set

n(u)

without replacement,

, . . . , x

are random

variables,

∀v, E [x

] = 0

and

∀v

6= v

, Cov [x

, x

] =

, then

Var

(l)

(u)

n(u)

(l)

v∈

(l)

(u)

n(u)

(l)

v∈n(u)

Var [x

] .

Proposition 3. X

and

are two random variables, and

f(X, Y )

and

g(Y )

are two functions. If

f(X, Y ) =

, then

Var

X,Y

[f(X, Y ) + g(Y )] = Var

X,Y

f(X, Y ) +

Var

g(Y ).

By Proposition 3,

Var

CVD

(l)

can be writ-

ten as the sum of

Var

√

v∈

(l)

and

Var

v∈

∆µ

(l)

v∈n(u)

¯µ

(l)

We refer

the ﬁrst term as the variance from dropout (VD) and

the second term as the variance from neighbor sam-

pling (VNS). Ideally, VD should equal to the variance

(P H

(l)

)

and VNS should be zero. VNS can be de-

rived by replicating the analysis in Sec. 3.2, replacing

with

. Let

(l)

= Var

(l)

= Var

(l)

, and

(l)

Var

(P H

(l)

)

v∈n(u)

(l)

, By Proposition 2, VD

CVD

(l)

v∈n(u)

Var

(l)

= S

(l)

, equals with the

VD of the exact estimator as desired.

We summarize the estimators and their variances in Table 2,

where the derivations are in Appendix A. As in Sec. 3.2,

VNS of CV and CVD depends on

∆µ

, which converges to

zero as the training progresses, while VNS of NS depends

on the non-zero

. On the other hand, CVD is the only

estimator except the exact one that gives correct VD.

5.3. Preprocessing Strategy

There are two possible models adopting dropout,

(l+1)

= P Dropout

(l)

(l+1)

Dropout

(P H

(l)

. The difference is whether

the dropout layer is before or after neighbor averaging. Kipf

& Welling (2017) adopt the former one, and we adopt the

latter one, while the two models performs similarly in prac-

tice, as we shall see in Sec. 6.1. The advantage of the latter

model is that we can preprocess

(0)

= P H

(0)

= P X

and

takes U

(0)

as the new input. In this way, the actual number

of graph convolution layers is reduced by one — the ﬁrst

layer is merely a fully-connected layer instead of a graph

convolution one. Since most GCNs only have two graph

convolution layers (Kipf & Welling, 2017; Hamilton et al.,

2017a), this gives a signiﬁcant reduction of the receptive

ﬁeld size and speeds up the computation. We refer this

optimization as the preprocessing strategy.

6. Experiments

We examine the variance and convergence of our algo-

rithms empirically on six datasets, including Citeseer, Cora,

剩余29页未读，继续阅读

weixin_40191861_zj

粉丝: 85
资源: 1万+

基于控制变量的图卷积网络随机训练算法研究

藏经阁-人工智能的投资机会.pdf

藏经阁-Applying Machine Learning to.pdf

蚂蚁金服人工智能部研究员ICML贡献论文07.pdf

蚂蚁金服人工智能部研究员ICML贡献论文05.pdf

蚂蚁金服人工智能部研究员ICML贡献论文02.pdf

蚂蚁金服人工智能部研究员ICML贡献论文01.pdf

ICML19-attention.pdf

2019-icml-蚂蚁金服-Generative Adversarial User Model for Reinforceme

ICML2023_Tutorial.pdf

relu_hybrid_icml2013_final.pdf

最新资源