Molecular Property Prediction: A Multilevel Quantum Interactions
Modeling Perspective
Chengqiang Lu
†
, Qi Liu
†
*
, Chao Wang
†
, Zhenya Huang
†
, Peize Lin
‡
, Lixin He
‡
†
Anhui Province Key Lab. of Big Data Analysis and Application, University of S&T of China
‡
Key Laboratory of Quantum Information, University of S&T of China
{qiliuql, helx}@ustc.edu.cn, {lunar, wdyx2012, huangzhy, linpz}@mail.ustc.edu.cn
Abstract
Predicting molecular properties (e.g., atomization energy)
is an essential issue in quantum chemistry, which could
speed up much research progress, such as drug designing
and substance discovery. Traditional studies based on den-
sity functional theory (DFT) in physics are proved to be
time-consuming for predicting large number of molecules.
Recently, the machine learning methods, which consider
much rule-based information, have also shown potentials for
this issue. However, the complex inherent quantum interac-
tions of molecules are still largely underexplored by exist-
ing solutions. In this paper, we propose a generalizable and
transferable Multilevel Graph Convolutional neural Network
(MGCN) for molecular property prediction. Specifically, we
represent each molecule as a graph to preserve its inter-
nal structure. Moreover, the well-designed hierarchical graph
neural network directly extracts features from the conforma-
tion and spatial information followed by the multilevel inter-
actions. As a consequence, the multilevel overall representa-
tions can be utilized to make the prediction. Extensive exper-
iments on both datasets of equilibrium and off-equilibrium
molecules demonstrate the effectiveness of our model. Fur-
thermore, the detailed results also prove that MGCN is gen-
eralizable and transferable for the prediction.
Introduction
Predicting molecular properties, such as atomization energy,
is one of the fundamental issues in quantum chemical sci-
ence. Indeed, it has attracted much attention in relevant fields
of physics, chemistry and computer science, since it speeds
up the societal and technological progress in the application
of discovering substances with desired characteristics, such
as drug design with specific target and new material manu-
facture (Becke 2007; Oglic, Garnett, and G
¨
artner 2017).
In the literature, density functional theory (DFT) plays
an important role in physics for molecular property pre-
diction. It holds a common statement that the quantum in-
teractions between particles (e.g., atom) create the correla-
tion and entanglement of molecules which are closely re-
lated to their inherent properties (Thouless 2014). Along
this line, many quantum mechanical methods based on DFT
have been developed to model the quantum interactions of
*
Contact author.
Copyright © 2019, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
DŽůĞĐƵůĞ
/ŶƚĞƌĂĐƚŝŽŶ
'ƌĂƉŚ
DƵůƚŝƐĐĂůĞ
/ŶƚĞƌĂĐƚŝŽŶƐ
KƵƚƉƵƚ
ƚŽŵ
ڮڮ
WƌŽƉĞƌƚLJ sĂůƵĞ
hϬ
Ͳϭϴϴϵϳϯϲ
;
Ϳ
WĂŝƌͲǁŝƐĞ
dƌŝƉůĞͲǁŝƐĞ
a
b
c
d
Figure 1: Illustration of the process of a molecule (CH
2
O
2
)
via our method.
molecules for the prediction (Hohenberg and Kohn 1964;
Kohn and Sham 1965). However, DFTs are computationally
costly since they usually use specific functions to determine
the interactions of particles, which proves to be extraordi-
narily time consuming. For example, experimental results
indicated that it took nearly an hour to predict the properties
of merely one molecule with 20 atoms (Gilmer et al. 2017).
Obviously, it is unacceptable to make prediction on large
number of molecules in chemical compound space. There-
fore, it is necessary to find more effective solutions.
Recently, inspired by the remarkable success of machine
learning in many tasks including computer vision, natural
language processing, natural and social science (Karpathy et
al. 2014; He et al. 2016; Huang et al. 2017; Zhu et al. 2018;
Liu et al. 2018), researchers have shown the potentials of
these data-driven techniques for molecular property predic-
tion (Faber et al. 2017; Sch
¨
utt et al. 2017a). Generally, these
studies mainly rely on rule-based feature engineering (e.g.,
bag of atom bonds) or treat molecules as grid-like structures
(e.g., 2D images or text). However, few of them directly take
the inherent quantum interactions of molecules into consid-
eration, causing severe information loss, which makes the
molecular property prediction problem pretty much open.
Unfortunately, there are many technical and domain chal-
lenges along this line. First, there are highly complex quan-
tum interactions, such as distracted attraction, exchange re-
pulsion and electrostatic interaction in molecules, especially
in the large molecules (Kollman 1985). It is hard to model
them with analytical methods. Second, compared with tra-
ditional tasks including computer vision, the amount of la-
beled molecule data is significantly limited, which requires a
generalizable approach for the prediction. Last but not least,
in practice, we are often provided with labeled data of small