Network predicting drug’s anatomical therapeutic
chemical code
Yong-Cui Wang
1
, Shi-Long Chen
1
, Nai-Yang Deng
2
and Yong Wang
3 ∗
1
Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau
Biology, Chinese Academy of Sciences, Xining, China, 810001.
2
College of Science, China Agricultural University, Beijing, China, 100083.
3
National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and
Systems Science, Chinese Academy of Sciences, Beijing, China, 100190.
ABSTRACT
Motivation: Discovering drug’s Anatomical Therapeutic Chemical
(ATC) classification rules at molecular level is of vital importance to
understand a vast majority of drugs action. However, few studies
attempt to annotate drug’s potential ATC-codes by computational
approaches.
Results: Here, we introduce drug-target network to computationally
predict drug’s ATC-codes and propose a novel method named
NetPredATC. Starting from the assumption that dr ugs with similar
chemical structures or target proteins share common ATC-codes, our
method, NetPredATC, aims to assign drug’s potential ATC-codes by
integrating chemical structures and target proteins. Specifically, we
first construct a gold-standard positive dataset from drugs’ ATC-code
annotation databases. Then we characterize ATC-code and drug by
their similarity profiles and define kernel function to correlate them.
Finally, we utilize a kernel method, support vector machine (SVM),
to automatically predict drug’s ATC-codes. Our method was validated
on four drug datasets with various target proteins, including enzymes,
ion channels (ICs), G-protein couple receptors (GPCRs), and nuclear
receptors (NRs). We found that both drug’s chemical structure
and target protein are predictive and target protein information has
better accuracy. Further integrating these two data sources revealed
more experimentally validated ATC-codes for drugs. We extensively
compared our NetPredATC with SuperPred, which is a chemical
similarity only based method. Experimental results showed that our
NetPredATC outperforms SuperPred not only in predictive coverage
but also in accuracy. In addition, database search and functional
annotation analysis support that our novel predictions are worthy of
future experimental validation.
Conclusion: In conclusion, our new method, NetPredATC, can
predict drug’s ATC-codes more accurately by incorporating drug-
target network and integrating data, which will promote drug
mechanism understanding and drug repositioning and discovery.
Availability: NetPredATC is available at http://doc.aporc.org/
wiki/NetPredATC.
Contact: ycwang@nwipb.cas.cn, ywang@amss.ac.cn
∗
To whom correspondence should be addressed
1 INTRODUCTION
The Anatomical Therapeutic Chemical (ATC) classification system
categorizes drug substances at different levels by their therapeutic
properties, chemical properties, pharmacological properties, and
practical applications. This classification system is recommended by
the World Health Organization (WHO) and drug’s ATC-codes have
been widely applied in almost all drug utilization studies (WHO,
2006). Specifically, ATC classification system can be used as a basic
tool for drug utilization research. It also provides the presentation
and comparison of drug consumption statistics at international
level. In addition, ATC prediction will greatly facilitate the recent
drug repositioning and drug combination studies. Though useful,
mapping ATC-codes to drugs is quite challenging.
Recently, ATC-codes for some well characterized drugs have
been deposited in databases, such as KEGG BRITE (Kanehisa
et al., 2006) and DrugBank (Wishart et al., 2008). These databases
provide high quality expert curated data. However, they are in
small scale and the coverage is far from enough to serve practical
usage. Even for some well-collected drug datasets, the ATC code
assignments for drugs are far from complete. For example, the
dataset in Yamanishi et al., 2008 contains drugs with four different
type target proteins including enzymes, ion channels (ICs), G-
protein couple receptors (GPCRs), and nuclear receptors (NRs).
These drugs all have manually curated target proteins from KEGG
BRITE (Kanehisa et al., 2006), BRENDA(Schomburg et al., 2004),
SuperTarget (Gunther et al., 2008), and DrugBank (Wishart et al.,
2008). Even in this high-quality dataset, there are 102 drugs which
do not have any ATC-codes in all 445 drugs targeting enzyme, 13
drugs which do not have any ATC-codes in all 210 drugs targeting
IC, 23 drugs which do not have any ATC-codes in all 223 drugs
targeting GPCR, and 4 drugs which do not have any ATC-codes in
all 54 drugs targeting NR. The percent of drugs without ATC codes
varies from 10% to 25%.
The bottleneck is that current data collection procedure heavily
relies on human curation and is not efficient. One way out
is to learn the underlying drug ATC-codes classification rules
from the available high quality ATC-code annotations, and
further automatically assign new ATC-codes to drugs by a
computational predictor. This strategy will accelerate the functional
characterization of drugs under the ATC classification systems,
1
Associate Editor: Dr. Olga Troyanskaya
© The Author (201
3). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Bioinformatics Advance Access published April 5, 2013
at Periodicals Department/Lane Library on April 5, 2013http://bioinformatics.oxfordjournals.org/Downloaded from