图自动机器学习：一项综合调查

下载需积分: 19 | PDF格式 | 383KB | 更新于2024-08-29 | 139 浏览量 | 举报

"《图自动机器学习：一项综述》由清华大学的研究团队发布，这篇论文深入探讨了图自动机器学习（AutoML on Graphs）这一新兴领域，旨在解决随着图学习文献快速增长，人工设计最优图机器学习算法的难题。文章重点关注超参数优化（HPO）和神经网络结构搜索（NAS）在图机器学习中的应用，并概述了相关自动化图机器学习库，特别是第一个专用于图AutoML的开源库AutoGL。" 正文：图自动机器学习是近年来在学术界和工业界中发展迅速的一个研究领域。它结合了图机器学习与自动机器学习的优势，以解决日益复杂的图数据处理问题。图机器学习主要处理非结构化数据，如社交网络、生物网络和知识图谱等，而自动机器学习则致力于通过自动化流程来优化模型的性能，减少人工干预。论文指出，随着图学习方法和技术的大量涌现，手动选择和调整最佳算法以适应不同任务的需求变得极其困难。因此，图自动机器学习应运而生，它能够自动化地进行模型选择、超参数优化和架构搜索，从而提高效率并降低开发难度。超参数优化（HPO）是图自动机器学习中的关键环节。HPO旨在找到最佳的超参数组合，以最大化模型的预测性能。在图学习中，这可能包括节点表示学习的嵌入维度、图卷积网络层数、学习率等。有效的HPO策略能显著提升模型的泛化能力。神经网络结构搜索（NAS）是另一个重点。在图数据上，这涉及到寻找最优的图神经网络（GNN）结构，如不同的消息传递层组合、池化操作等。 NAS的目标是自动化这个过程，使得算法能够根据特定任务自动调整其网络结构，从而实现更好的性能。此外，论文还提到了几个自动化图机器学习库，这些库为研究人员提供了工具和框架，以方便进行图AutoML。特别值得一提的是AutoGL，它是首个专门为图AutoML设计的开源库，包含多种HPO和NAS算法，以及丰富的图学习模块，旨在促进图数据处理的自动化研究和应用。最后，作者分享了对图自动机器学习未来发展的见解，包括可能的研究方向、挑战以及潜在的应用场景。他们认为，随着计算能力的增强和算法的不断改进，图自动机器学习将在数据分析、药物发现、推荐系统等领域发挥更大作用，同时也会面临如可解释性、效率和通用性等方面的挑战。这篇综述论文全面介绍了图自动机器学习的现状和前景，为研究人员提供了宝贵的参考，并有望推动该领域的进一步发展。

Automated Machine Learning on Graphs: A Survey

Ziwei Zhang

∗

, Xin Wang

∗

and Wenwu Zhu

Tsinghua University, Beijing, China

zw-zhang16@mails.tsinghua.edu.cn, {xin wang,wwzhu}@tsinghua.edu.cn

Abstract

Machine learning on graphs has been extensively

studied in both academic and industry. However,

as the literature on graph learning booms with a

vast number of emerging methods and techniques,

it becomes increasingly difﬁcult to manually design

the optimal machine learning algorithm for differ-

ent graph-related tasks. To solve this critical chal-

lenge, automated machine learning (AutoML) on

graphs which combines the strength of graph ma-

chine learning and AutoML together, is gaining at-

tentions from the research community. Therefore,

we comprehensively survey AutoML on graphs in

this paper, primarily focusing on hyper-parameter

optimization (HPO) and neural architecture search

(NAS) for graph machine learning. We further

overview libraries related to automated graph ma-

chine learning and in depth discuss AutoGL, the

ﬁrst dedicated open-source library for AutoML on

graphs. In the end, we share our insights on fu-

ture research directions for automated graph ma-

chine learning. To the best of our knowledge, this

paper is the ﬁrst systematic and comprehensive re-

view of automated machine learning on graphs.

1 Introduction

Graph data is ubiquitous in our daily life. We can use graphs

to model the complex relationships and dependencies be-

tween entities ranging from small molecules in proteins and

particles in physical simulations to large national-wide power

grids and global airlines. Therefore, machine learning on

graphs has long been an important research direction for both

academics and industry

[

Newman , 2018

]

. In particular, net-

work embedding

[

Cui et al., 2018; Hamilton et al., 2017;

Goyal and Ferrara, 2018b; Cai et al., 2018b

]

and graph neu-

ral networks (GNNs)

[

Zhang et al., 2020b; Wu et al., 2020;

Zhou et al., 2018

]

have drawn increasing attention in the last

decade. They are successfully applied to recommendation

systems

[

Ying et al., 2018a; Ma et al., 2019

]

, fraud detec-

tion

[

Akoglu et al., 2015

]

, bioinformatics

[

Su et al., 2020;

Zitnik and Leskovec, 2017

]

, physical simulation

[

Kipf et

∗

Equal contributions

al., 2018

]

, trafﬁc forecasting

[

Li et al., 2018b; Yu et al.,

2018

]

, knowledge representation

[

Wang et al., 2017

]

, drug

re-purposing

[

Ioannidis et al., 2020; Gysi et al., 2020

]

and

pandemic prediction

[

Kapoor et al., 2020

]

for Covid-19.

Despite the popularity of graph machine learning algo-

rithms, the existing literature heavily relies on manual hyper-

parameter or architecture design to achieve the best perfor-

mance, resulting in costly human efforts when a vast num-

ber of models emerge for various graph tasks. Take GNNs

as an example, at least one hundred new general-purpose ar-

chitectures have been published in top-tier machine learning

and data mining conferences in the year of 2020 alone, not

to mention cross-disciplinary researches of task-speciﬁc de-

signs. More and more human efforts are inevitably needed if

we stick to the manual try-and-error paradigm in designing

the optimal algorithms for targeted tasks.

On the other hand, automated machine learning (AutoML)

has been extensively studied to reduce human efforts in de-

veloping and deploying machine learning models

[

He et

al., 2020; Yao et al., 2018

]

. Complete AutoML pipelines

have the potential to automate every step of machine learn-

ing, including auto data collection and cleaning, auto fea-

ture engineering, and auto model selection and optimization,

etc. Due to the popularity of deep learning models, hyper-

parameter optimization (HPO)

[

Bergstra and Bengio, 2012;

Bergstra et al., 2011; Snoek et al., 2012

]

and neural archi-

tecture search (NAS)

[

Elsken et al., 2019

]

are most widely

studied. AutoML has achieved or surpassed human-level per-

formance

[

Zoph and Le, 2017; Liu et al., 2018; Pham et al.,

2018

]

with little human guidance in areas such as computer

vision

[

Zoph et al., 2018; Real et al., 2019

]

Automated machine learning on graphs, combining ada-

vantages of AutoML and graph machine learning, naturally

serves as a promising research direction to further boost the

model performance, which has attracted an increasing num-

ber of interests from the community. In this paper, we pro-

vide a comprehensive and systematic review of automated

machine learning on graphs, to the best of our knowledge,

for the ﬁrst time. Speciﬁcally, we focus on two major top-

ics: HPO and NAS of graph machine learning. For HPO,

we focus on how to develop scalable methods. For NAS,

we follow the literature and compare different methods from

search spaces, search strategies, and performance estimation

strategies. How different methods tackle the challenges of

arXiv:2103.00742v1 [cs.LG] 1 Mar 2021

下载后可阅读完整内容，剩余7页未读，立即下载

syp_net

粉丝: 158

图自动机器学习：一项综合调查

最新《图机器学习》综述论文 (斯坦福谷歌)

机器学习自动化：误解与真相

自动化机器学习的原理paper

清华大学孙茂松课题组发布图神经网络综述

清华大学崔鹏等最新「分布外泛化(Out-Of-Distribution Generalization)」 综述论文

如何写硕士和博士学位论文_清华大学讲座

机器翻译学术论⽂论文写作方法和技巧_清华刘洋

赵鑫 - 中国人民大学 - 如何以初学者的身份写好一篇国际学术论文.pdf

工学藏文文本自动分类PPT学习教案.pptx

清华大学崔鹏等深度学习分布外泛化综述

最新资源

清华大学崔鹏等最新「分布外泛化(Out-Of-Distribution Generalization)」综述论文