大规模深度高斯马尔可夫随机场：一般图学习新方法

需积分: 9 158 浏览量更新于2024-07-02 收藏 2.55MB PDF 举报

"Scalable Deep Gaussian Markov Random Fields for General Graphs" 本文是Joel Oskarsson、Per Sidén和Fredrik Lindsten在2022年国际机器学习会议(ICML 2022)上发表的一篇论文，arXiv预印本编号为2206.05032v1，主题属于统计机器学习领域。研究主要关注如何将高斯马尔可夫随机场（GMRFs）的概念扩展到更广泛的图结构，以适应大规模的图数据。高斯马尔可夫随机场是一种在图论和概率论中常用的统计模型，特别适用于处理具有稀疏结构的数据。在GMRFs中，节点之间的条件独立性由图的邻接关系决定，这种特性使得在大尺度问题上进行建模和推断变得可能。然而，传统的GMRFs通常局限于规则网格或简单拓扑结构。这篇论文引入了一种新的深度GMRF模型，该模型具有多层结构，可以应用于任意图。作者设计了一种新型层，使得模型能够在保持高效训练的同时，处理大规模、非结构化的图。这种层的创新之处在于它兼容了变分推理和图神经网络（GNNs）的现有软件框架，这使得在高斯似然下对潜在场进行接近精确的贝叶斯推断成为可能。变分推理是一种在复杂概率模型中近似后验分布的方法，它在大规模问题中特别有用。通过这种技术，模型不仅能够进行预测，还能提供不确定性估计，这对于理解和评估模型的性能至关重要。实验结果表明，提出的模型在合成数据和真实世界数据集上表现优越，超越了其他基于贝叶斯和深度学习的方法。总结来说，这项工作为图上的机器学习提供了新的工具，特别是在处理大型、非结构化数据时。通过将深度学习与GMRFs相结合，研究人员能够构建出能够捕捉复杂依赖关系并进行有效推断的模型，这对于各种领域的应用，如社交网络分析、生物信息学和图像分析等，都具有重要的意义。

Scalable Deep Gaussian Markov Random Fields for General Graphs

3.2. Variational training

The parameters of a DGMRF can be trained by maximizing

the log marginal likelihood

log p(y

|θ)

. This is however

often infeasible as it requires computing the determinant of

the posterior precision matrix

(Sid

en & Lindsten, 2020).

For large

one can instead resort to variational inference,

maximizing the Evidence Lower Bound (ELBO),

ELBO(θ, φ) = E

q(x|φ)

[log p(y

, x|θ)] + H[q(x|φ)]

≤ log p(y

|θ)

(7)

where

is a variational distribution with parameters

and

H[·]

refers to differential entropy. For a DGMRF with a

Gaussian likelihood the ﬁrst term of the ELBO is

q(x|φ)

[log p(y

, x|θ)] =

−

q( x|φ)



g(x)

g(x) +

− x)



+ log|det(G)| −M log σ + const.

(8)

where

M =

i=1

is the number of observed nodes.

The expectation in Eq. 8 can be estimated using a set of

samples drawn from

. As

G = G

(L)

(L−1)

. . . G

(1)

, the

log-(absolute)-determinant is given by

log|det(G)| =

l=1

log



det



(l)





. (9)

Computing this efﬁciently is one of the major challenges

with the general graph setting, as will be discussed further

in section 3.3.

The full set of model parameters

are the trainable pa-

rameters of each layer and the noise standard deviation

Maximizing the ELBO w.r.t.

and

can be done using

gradient-based stochastic optimization.

3.2.1. VARIATIONAL DISTRIBUTION

A natural and useful way to choose the variational distribu-

tion is as another Gaussian

q(x|φ) = N(x|ν, SS

)

. This

corresponds to deﬁning

by another afﬁne transformation

in the opposite direction of the DGMRF,

x = Sr + ν, r ∼ N(0, I). (10)

Note the difference to Eq. 2 as we here parametrize the

covariance matrix instead of the precision matrix. This

parametrization additionally allows for computing gradi-

ents through the sampling process, by the use of the

reparametrization trick (Kingma & Welling, 2014).

Sid

en & Lindsten (2020) use a simple mean ﬁeld approx-

imation with a diagonal

, making all components of

independent (Bishop, 2006). However, we propose a more

ﬂexible q by choosing

S = diag(ξ)

G diag(τ ) (11)

where

ξ, τ ∈ R

are vectors containing positive parameters

and

is deﬁned in the same way as the DGMRF layer in

Eq. 6. Including the matrix

introduces off-diagonal

elements in the covariance matrix of

, alleviating the inde-

pendence assumption between nodes. Multiple such layers

can also be used, introducing longer dependencies between

nodes in the graph. The full set of variational parameters

is then

ν, ξ, τ

and all trainable parameters from the layer(s)

. In Appendix A.2 we empirically show that DGMRFs

trained using our more ﬂexible variational distribution con-

sistently outperforms those trained using the simple mean

ﬁeld approximation.

With this choice of S the entropy term of the ELBO is

H[q(x|φ)] = log|det(S)| + const.

= log



det







i=1

log ξ

+ log τ

+ const.

(12)

Re-using the DGMRF layer construct in

has the added ben-

eﬁt that the techniques we develop for the log-determinant

computation readily extend also to computing H[q(x|φ)].

3.3. Computing the log-determinant

Computing the necessary log-determinants in Eq. 9 and

12 efﬁciently is a major challenge with the general graph

setting. The CNN-based DGMRF was deﬁned on a lattice

graph, which creates a special structure in

and allows

for ﬁnding efﬁcient closed-form expressions for the log-

determinants (Sid

en & Lindsten, 2020). As we do not make

any such assumptions on the graph structure we here pro-

pose new scalable methods to compute the log-determinants.

3.3.1. EIGENVALUE METHOD

One way to compute the log-determinant is based on the

eigenvalues of the matrix. As the determinant is given by

the product of all eigenvalues,

log



det



(l)





= log|det(D

+ log



det



I + β

−1





i=1

log(d

) + log |λ

(13)

where

{λ

}

i=1

are the eigenvalues of

I + β

−1

. It

can be shown

that

= α

+ β

with

being the

:th

For an eigenvector

−1

with eigenvalue

−1

= λ

⇒ (α

I + β

−1

A)v

= (α

+ β

剩余21页未读，继续阅读

努力+努力=幸运

粉丝: 17
资源: 136

大规模深度高斯马尔可夫随机场：一般图学习新方法

"Spark平台上的可扩展深度学习研究：藏经阁

ISSCC2021: Scalable In-Memory Computing for Deep Neural Network Processors

数据结构第5章数组和广义表A.ppt编程题解析及实现技巧

A scalable web crawler framework for Java..zip

藏经阁-Smart Scalable Feature Reduction with Random Forests.pdf

藏经阁-Scalable Deep Learning on Spark.pdf

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

DB - Scalable Replay-Based Replication For Fast Databases.pd

Architecting.for.Scale.High.Availability.for.Your.Growing.Applications.epub

(Chord)A scalable peer-to-peer lookup service for Internet applications.ppt

最新资源