
Computational Biology and Chemistry 58 (2015) 173–181
Contents lists available at ScienceDirect
Computational Biology and Chemistry
journal homepage: www.elsevier.com/locate/compbiolchem
Research Article
MOEPGA: A novel method to detect protein complexes in yeast
protein–protein interaction networks based on MultiObjective
Evolutionary Programming Genetic Algorithm
Buwen Cao
a,b
, Jiawei Luo
a,b,∗
, Cheng Liang
a,b
, Shulin Wang
a,b
, Dan Song
a,b
a
College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
b
Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province, China
article info
Article history:
Received 2 February 2015
Received in revised form 2 June 2015
Accepted 22 June 2015
Available online 7 July 2015
Keywords:
Protein–protein interaction (PPI) network
Protein complex
Multiobjective evolutionary
Normalized clustering score
abstract
The identification of protein complexes in protein–protein interaction (PPI) networks has greatly
advanced our understanding of biological organisms. Existing computational methods to detect protein
complexes are usually based on specific network topological properties of PPI networks. However, due
to the inherent complexity of the network structures, the identification of protein complexes may not
be fully addressed by using single network topological property. In this study, we propose a novel Multi-
Objective Evolutionary Programming Genetic Algorithm (MOEPGA) which integrates multiple network
topological features to detect biologically meaningful protein complexes. Our approach first systemati-
cally analyzes the multiobjective problem in terms of identifying protein complexes from PPI networks,
and then constructs the objective function of the iterative algorithm based on three common topological
properties of protein complexes from the benchmark dataset, finally we describe our algorithm, which
mainly consists of three steps, population initialization, subgraph mutation and subgraph selection oper-
ation. To show the utility of our method, we compared MOEPGA with several state-of-the-art algorithms
on two yeast PPI datasets. The experiment results demonstrate that the proposed method can not only
find more protein complexes but also achieve higher accuracy in terms of fscore. Moreover, our approach
can cover a certain number of proteins in the input PPI network in terms of the normalized clustering
score. Taken together, our method can serve as a powerful framework to detect protein complexes in
yeast PPI networks, thereby facilitating the identification of the underlying biological functions.
© 2015 Elsevier Ltd. All rights reserved.
1. Introduction
Proteins are critical components of cell activities and are essen-
tial to our understanding of molecular function as well as biological
process. In biological organisms, proteins are usually organized
into a protein complex, in which they carry out specific biological
functions cooperatively (Zhang et al., 2013). For example, proteins
YDL105W, YDR288W, YEL019C, YER038C, YLR007W, YLR383W,
YML023C, and YOL034W form a complex “Smc5-Smc6” and regu-
late both recombination and kinetochore sumoylation to promote
chromosomal maintenance during growth process (Yong-Gonzales
et al., 2012). Therefore, studies have been focused on elucidating
∗
Corresponding author. Permanent address: College of Computer Science and
Electronic Engineering, Hunan University, Changsha 410082, China.
E-mail address: luojiawei@hnu.edu.cn (J. Luo).
protein functions in PPI networks based on their interacting part-
ners (Girvan and Newman, 2002; Jin et al., 2015; Mering et al.,
2002), which further strengthen our biological knowledge on pro-
tein assembly processes for cellular organization.
Although there are a number of ways to detect protein com-
plexes experimentally, such as yeast-two-hybrid (Zhang et al.,
2013) and tandem affinity purification with mass spectrometry
(Chen and Wu, 2013; Zhang et al., 2013), they were shown to
have certain limitations (Li et al., 2005). Specifically, PPI data
derived from high-throughput experiments usually have high
false-positive and false-negative rates, which substantially affects
the accuracy of the experiment results (Chen and Wu, 2013). There-
fore, computational approaches to detect protein complexes are
developed as useful complements to the experimental methods.
Several methods based on density have been proposed to
identify densely connected subgraphs in PPI networks, where sub-
graphs with density above a pre-defined threshold were considered
http://dx.doi.org/10.1016/j.compbiolchem.2015.06.006
1476-9271/© 2015 Elsevier Ltd. All rights reserved.