
Automated Machine Learning on Graphs: A Survey
Ziwei Zhang
∗
, Xin Wang
∗
and Wenwu Zhu
Tsinghua University, Beijing, China
zw-zhang16@mails.tsinghua.edu.cn, {xin wang,wwzhu}@tsinghua.edu.cn
Abstract
Machine learning on graphs has been extensively
studied in both academic and industry. However,
as the literature on graph learning booms with a
vast number of emerging methods and techniques,
it becomes increasingly difficult to manually design
the optimal machine learning algorithm for differ-
ent graph-related tasks. To solve this critical chal-
lenge, automated machine learning (AutoML) on
graphs which combines the strength of graph ma-
chine learning and AutoML together, is gaining at-
tentions from the research community. Therefore,
we comprehensively survey AutoML on graphs in
this paper, primarily focusing on hyper-parameter
optimization (HPO) and neural architecture search
(NAS) for graph machine learning. We further
overview libraries related to automated graph ma-
chine learning and in depth discuss AutoGL, the
first dedicated open-source library for AutoML on
graphs. In the end, we share our insights on fu-
ture research directions for automated graph ma-
chine learning. To the best of our knowledge, this
paper is the first systematic and comprehensive re-
view of automated machine learning on graphs.
1 Introduction
Graph data is ubiquitous in our daily life. We can use graphs
to model the complex relationships and dependencies be-
tween entities ranging from small molecules in proteins and
particles in physical simulations to large national-wide power
grids and global airlines. Therefore, machine learning on
graphs has long been an important research direction for both
academics and industry
[
Newman , 2018
]
. In particular, net-
work embedding
[
Cui et al., 2018; Hamilton et al., 2017;
Goyal and Ferrara, 2018b; Cai et al., 2018b
]
and graph neu-
ral networks (GNNs)
[
Zhang et al., 2020b; Wu et al., 2020;
Zhou et al., 2018
]
have drawn increasing attention in the last
decade. They are successfully applied to recommendation
systems
[
Ying et al., 2018a; Ma et al., 2019
]
, fraud detec-
tion
[
Akoglu et al., 2015
]
, bioinformatics
[
Su et al., 2020;
Zitnik and Leskovec, 2017
]
, physical simulation
[
Kipf et
∗
Equal contributions
al., 2018
]
, traffic forecasting
[
Li et al., 2018b; Yu et al.,
2018
]
, knowledge representation
[
Wang et al., 2017
]
, drug
re-purposing
[
Ioannidis et al., 2020; Gysi et al., 2020
]
and
pandemic prediction
[
Kapoor et al., 2020
]
for Covid-19.
Despite the popularity of graph machine learning algo-
rithms, the existing literature heavily relies on manual hyper-
parameter or architecture design to achieve the best perfor-
mance, resulting in costly human efforts when a vast num-
ber of models emerge for various graph tasks. Take GNNs
as an example, at least one hundred new general-purpose ar-
chitectures have been published in top-tier machine learning
and data mining conferences in the year of 2020 alone, not
to mention cross-disciplinary researches of task-specific de-
signs. More and more human efforts are inevitably needed if
we stick to the manual try-and-error paradigm in designing
the optimal algorithms for targeted tasks.
On the other hand, automated machine learning (AutoML)
has been extensively studied to reduce human efforts in de-
veloping and deploying machine learning models
[
He et
al., 2020; Yao et al., 2018
]
. Complete AutoML pipelines
have the potential to automate every step of machine learn-
ing, including auto data collection and cleaning, auto fea-
ture engineering, and auto model selection and optimization,
etc. Due to the popularity of deep learning models, hyper-
parameter optimization (HPO)
[
Bergstra and Bengio, 2012;
Bergstra et al., 2011; Snoek et al., 2012
]
and neural archi-
tecture search (NAS)
[
Elsken et al., 2019
]
are most widely
studied. AutoML has achieved or surpassed human-level per-
formance
[
Zoph and Le, 2017; Liu et al., 2018; Pham et al.,
2018
]
with little human guidance in areas such as computer
vision
[
Zoph et al., 2018; Real et al., 2019
]
.
Automated machine learning on graphs, combining ada-
vantages of AutoML and graph machine learning, naturally
serves as a promising research direction to further boost the
model performance, which has attracted an increasing num-
ber of interests from the community. In this paper, we pro-
vide a comprehensive and systematic review of automated
machine learning on graphs, to the best of our knowledge,
for the first time. Specifically, we focus on two major top-
ics: HPO and NAS of graph machine learning. For HPO,
we focus on how to develop scalable methods. For NAS,
we follow the literature and compare different methods from
search spaces, search strategies, and performance estimation
strategies. How different methods tackle the challenges of
arXiv:2103.00742v1 [cs.LG] 1 Mar 2021