图机器学习综述：网络嵌入、图正则化神经网络和图神经网络

需积分: 14 52 浏览量更新于2024-07-16 1 收藏 1.97MB PDF 举报

图机器学习综述论文图机器学习是近年来兴起的一个领域，它旨在学习图结构数据的表示。随着标记数据的可用性，图表示学习方法可以分为三大类。第一类是网络嵌入（such as shallow graph embedding or graph auto-encoders），它侧重于学习关系结构的无监督表示。网络嵌入的目的是将图结构数据转换为低维向量空间，以便于后续的机器学习任务。网络嵌入可以分为两类，一类是基于矩阵分解的方法，另一类是基于深度学习的方法。基于矩阵分解的方法包括非负矩阵分解、图拉普拉斯矩阵分解等，而基于深度学习的方法包括图自动编码器、图生成对抗网络等。第二类是图正则化神经网络（Graph Regularized Neural Networks），它利用图来增加半监督学习的正则化目标的神经网络损失。图正则化神经网络可以分为两类，一类是基于图的损失函数，另一类是基于图的正则化项。基于图的损失函数可以是图 Laplacian 矩阵的_trace_，也可以是图的 spectral norm。基于图的正则化项可以是图 Laplacian 矩阵的_ norm，也可以是图的 vertex degree。第三类是图神经网络（Graph Neural Networks），它旨在学习不同的图结构数据的表示。图神经网络可以分为两类，一类是基于 spectral 方法的图神经网络，另一类是基于 spatial 方法的图神经网络。基于 spectral 方法的图神经网络可以学习图的 spectral 表示，而基于 spatial 方法的图神经网络可以学习图的 spatial 表示。在这篇综述论文中，作者们对图机器学习的三类方法进行了详细的介绍和比较。他们讨论了每种方法的优缺点，并对图机器学习的未来发展方向进行了展望。图机器学习的应用非常广泛，包括社会网络分析、推荐系统、计算生物学等领域。在社会网络分析中，图机器学习可以用于用户行为分析和社区发现。在推荐系统中，图机器学习可以用于 item embedding 和用户 profiling。在计算生物学中，图机器学习可以用于蛋白质结构预测和药物发现。图机器学习是一个非常重要的领域，它可以解决许多复杂的问题。但是，图机器学习也存在一些挑战，例如图结构数据的稀疏性和 noises 的存在。为了解决这些挑战，需要开发新的图机器学习算法和模型。在这篇综述论文中，作者们也讨论了图机器学习的挑战和未来发展方向。他们认为，图机器学习的挑战来自于图结构数据的复杂性和 noises 的存在。为了解决这些挑战，需要开发新的图机器学习算法和模型，例如基于图的注意力机制和基于图的生成对抗网络。图机器学习是一个非常重要的领域，它可以解决许多复杂的问题。但是，图机器学习也存在一些挑战，需要开发新的图机器学习算法和模型来解决这些挑战。

Our framework (GRAPHEDM) is general. Speciﬁc choices of the aforementioned (encoder and decoder) networks

allows GRAPHEDM to realize speciﬁc graph embedding methods. GRAPHEDM is illustrated in Figure 2. Before

presenting the taxonomy and showing realizations of various methods using our framework, we brieﬂy discuss an

application perspective.

Output The GRAPHEDM model can return a reconstructed graph similarity matrix

W (often used to train unsuper-

vised embedding algorithms), as well as a output labels by

for supervised applications. The label output space varies

depending on the supervised application.

• Node-level supervision, with by

∈ Y

|V |

, where Y represents the node label space. If Y is categorical, then this

is also known as (semi-)supervised node classiﬁcation (Section 6.2.1), in which case the label decoder network

produces labels for each node in the graph. If d-dimensional Z is such that d = |Y|, then the label decoder net-

work can be just a simple softmax activation across the rows of Z, producing a distribution over labels for each

node. Additionally, the graph decoder network might also be leveraged in supervised node-classiﬁcation tasks,

as it can be used to regularize embeddings (e.g. neighbor nodes should have nearby embeddings, regardless of

node labels).

• Edge-level supervision, with by

∈ Y

|V |×|V |

, where Y represents the edge label space. For example, Y can

be multinomial in knowledge graphs (for describing the types of relationships between two entities), setting

Y = {0, 1}

#(relation types)

. It is common to have #(relation types) = 1, and this is is known as link prediction,

where edge relations are binary. In this review, when by

= {0, 1}

|V |×|V |

(i.e. Y = {0, 1}), then rather than

naming the output of the decoder as by

, we instead follow the nomenclature and position link prediction as an

unsupervised task (Section 4). Then in lieu of by

we utilize

W , the output of the graph decoder network (which

is learned to reconstruct target similarity matrix s(W )) to rank potential edges.

• Graph-level supervision, with by

∈ Y. In the graph classiﬁcation task (Section 6.2.2), the label decoder

network converts node embeddings Z using input adjacency W , into graph labels, using graph pooling. More

concretely, the graph pooling operation is similar to pooling in standard CNNs, where the goal is to downsample

local feature representations to capture higher-level information. However, unlike images, graphs don’t have a

regular grid structure and it is hard to deﬁne a pooling pattern which could be applied to every node in the graph.

A possible way of doing so is via graph coarsening, which groups similar nodes into clusters to produce smaller

graphs [32]. There exist other pooling methods on graphs such as DiffPool [120] or SortPooling [123] which

creates an ordering of nodes based on their structural roles in the graph. We do not cover the details of graph

pooling operators and refer the reader to recent surveys [116] for more details about graph pooling.

3.2 Taxonomy of objective functions

We now focus our attention on the optimization of models that can be described in the GRAPHEDM framework by

describing the loss functions used for training. Let Θ = {Θ

, Θ

} denote all model parameters. GRAPHEDM

models can be optimized using a combination of the following loss terms:

• Supervised loss term, L

SUP

, which compares the predicted labels ˆy

to the ground truth labels y

. This term

depends on the task the model is being trained for. For instance, in semi-supervised node classiﬁcation tasks

(S = N), the graph vertices are split into labelled and unlabelled nodes (V = V

∪ V

), and the supervised loss

is computed for each labelled node in the graph:

SUP

, ˆy

; Θ) =

i|v

∈V

`(y

, ˆy

; Θ),

where `(·) is the loss function used for classiﬁcation (e.g. cross-entropy). Similarly for graph classiﬁcation tasks

(S = G), the supervised loss is computed at the graph-level and can be summed across multiple training graphs:

SUP

, ˆy

; Θ) = `(y

, ˆy

; Θ).

• Graph regularization loss term, L

G,REG

, which leverages the graph structure to impose regularization con-

straints on the model parameters. This loss term measures the distance between the decoded similarity matrix

W and a target similarity matrix s(W ), which might capture higher-order proximities than the adjacency matrix

itself:

G,REG

(W,

W ; Θ) = d

(s(W ),

W ), (1)

where d

(·, ·) is a distance or dissimilarity function. Examples for such regularization are constraining neigh-

boring nodes to share similar embeddings, in terms of their distance in L2 norm. We will cover more examples

of regularization functions in Sections 4 and 5.

• Weight regularization loss, L

REG

, e.g. for representing prior, on trainable model parameters for reducing

overﬁtting. The most common regularization is L2 regularization (assumes standard Gaussian prior):

REG

(Θ) =

θ∈Θ

||θ||

Finally, models realizable by GRAPHEDM framework are trained by minimizing the total loss L deﬁned as:

L = αL

SUP

, ˆy

; Θ) + βL

G,REG

(W,

W ; Θ) + γL

REG

(Θ), (2)

where α, β and γ are hyper-parameters, that can be tuned or set to zero. Note that graph embedding methods can be

trained in a supervised (α 6= 0) or unsupervised (α = 0) fashion. Supervised graph embedding approaches leverage an

additional source of information to learn embeddings such as node or graph labels. On the other hand, unsupervised

network embedding approaches rely on the graph structure only to learn node embeddings.

A common approach to solve supervised embedding problems is to ﬁrst learn embeddings with an unsupervised

method (Section 4) and then train a supervised model on the learned embeddings. However, as pointed by Weston et

al. [115] and others, using a two-step learning algorithm might lead to sub-optimal performances for the supervised

task, and in general, supervised methods (Section 5) outperform two-step approaches.

3.3 Taxonomy of encoders

Having introduced all the building blocks of the GRAPHEDM framework, we now introduce our graph embedding

taxonomy. While most methods we describe next fall under the GRAPHEDM framework, they will signiﬁcantly differ

based on the encoder used to produce the node embeddings, and the loss function used to learn model parameters. We

divide graph embedding models into four family of approaches:

• Shallow embedding methods, where the encoder function is a simple embedding lookup. That is, the parame-

ters of the model Θ

are directly used as node embeddings:

Z = ENC(Θ

)

= Θ

∈ R

|V |×d

• Graph regularization methods, where the encoder network only uses node features X as input:

Z = ENC(X; Θ

As its name suggests, graph regularization methods leverage the graph structure through the graph regularization

loss term in Equation 2 (β 6= 0) to regularize node embeddings.

• Graph auto-encoding methods, where the encoder is a function of the graph structure only:

Z = ENC(W ; Θ

• Neighborhood aggregation methods, including graph convolutional methods, where both the node features

and the graph structure are used in the encoder network. Neighborhood aggregation methods use the graph

structure to propagate information across nodes and learn embeddings that encode local and global structural

information:

Z = ENC(W, X; Θ

Note that shallow embedding methods and graph auto-encoders do not leverage node features and are inherently

transductive, that is they cannot be applied to inductive problems where the graph structure is not ﬁxed. In what

follows, we review recent methods for supervised and unsupervised graph embedding techniques using GRAPHEDM

and summarize the proposed taxonomy in Figure 3.

剩余37页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

图机器学习综述：网络嵌入、图正则化神经网络和图神经网络

机器学习3篇综述

2019-2020必看的十篇【深度学习领域综述论文】.zip

关于统计机器学习的一些文献

UCL最新「机器学习隐私」综述论文，概述隐私挑战

机器学习笔记，斯坦福

机器学习综述

《图对抗机器学习》2020综述论文.pdf

机器学习笔记（斯坦福）

Coursera-机器学习-斯坦福大学：机器学习-斯坦福大学

吴恩达机器学习笔记：斯坦福大学2014年机器学习课程学习笔记

最新资源