RESEARCH Open Access
ppiPre: predicting protein-protein interactions by
combining heterogeneous features
Yue Deng
1,2
, Lin Gao
1*
, Bingbo Wang
1
From The 6th International Conference on Computational Systems Biology (ISB2012)
Xi’an, China. 18-20 August 2012
Abstract
Background: Protein-protein interactions (PPIs) are crucial in cellular processes. Since the current biological
experimental techniques are time-consuming and expensive, and the results suffer from the problems of
incompleteness and noise, developing computational methods and software tools to predict PPIs is necessary.
Although several approaches have been proposed, the species supported are often limited and additional data like
homologous interactions in other species, protein sequence and protein expression are often required. And
predictive abilities of different features for different kinds of PPI data have not been studied.
Results: In this paper, we propose ppiPre, an open-source framework for PPI analysis and prediction using a
combination of heterogeneous features including three GO-based semantic similarities, one KEGG-based co-
pathway similarity and three topology-based similarities. It supports up to twenty species. Only the original PPI
data and gold-standard PPI data are required from users. The experiments on binary and co-complex gold-
standard yeast PPI data sets show that there exist big differences among the predictive abilities of different
features on different kinds of PPI data sets. And the prediction performance on the two data sets shows that
ppiPre is capable of handling PPI data in different kinds and sizes. ppiPre is implemented in the R language and is
freely available on the CRAN (http://cran.r-project .org/web/packages/ppiPre/).
Conclusions: We applied our framework to both binary and co-complex gold-standard PPI data sets. The detailed
analysis on three GO aspects suggests that different GO aspects should be used on different kinds of data sets, and
that combining all the three aspects of GO often gets the best result. The analysis also shows that using only
features based solely on the topology of the PPI network can get a very good result when predicting the co-
complex PPI da ta. ppiPre provides useful functions for analysing PPI data and can be used to predict PPIs for
multiple species.
Background
Although different experimental methods [1,2] have
already generated a large amount of PPI for many
model species in recent years [3], these existing PPI data
are incomplete and contain many false positive interac-
tions. In order to refine these PPI data, computational
approaches are urgently needed.
Some recent researches have shown that PPIs can be
integrated with other kinds of biological data in using
supervised learning to predict PPIs [4-7]. In supervised
learning, a classifier is trained using truly interacting pro-
tein pairs (positive samples) and protein pairs which are
not interacting with each other ( negative samples). Then
the trained classifier is able to recover false negative inter-
actions and remove false positive interactions from the
PPIs input by users.
Existing studies are mainly differing in the selection of
features used in the prediction framework. In these stu-
dies, different biological evidences are extracted and used
as features training the classifier, including Gene Ontology
(GO) functional annotations [8,9], protein sequences [10]
and co-expressed proteins [11]. For the organisms or
* Correspondence: lgao@mail.xidian.edu.cn
1
School of Computer Science and Technology, Xidian Universi ty, Xi’an
710071, PR China
Full list of author information is availabl e at the end of the article
Deng et al. BMC Systems Biology 2013, 7(Suppl 2):S8
http://www.biomedcentral.com/1752-0509/7/S2/S8
© 2013 Deng et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.