Sequence analysis
MultiP-SChlo: multi-label protein subchloroplast
localization prediction with Chou’s pseudo
amino acid composition and a novel multi-label
classifier
Xiao Wang
1,
*, Weiwei Zhang
1
, Qiuwen Zhang
1
and Guo-Zheng Li
2,
*
1
School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002,
China and
2
Department of Control Science and Engineering, Tongji University, Shanghai 201804, China
*To whom correspondence should be addressed.
Associate Editor: John Hancock
Received on December 19, 2014; revised on March 29, 2015; accepted on April 13, 2015
Abstract
Motivation: Identifying protein subchloroplast localization in chloroplast organelle is very helpful
for understanding the function of chloroplast proteins. There have existed a few computational pre-
diction methods for protein subchloroplast localization. However, these existing works have
ignored proteins with multiple subchloroplast locations when constructing prediction models, so
that they can predict only one of all subchloroplast locations of this kind of multilabel proteins.
Results: To address this problem, through utilizing label-specific features and label correlations
simultaneously, a novel multilabel classifier was developed for predicting protein subchloroplast
location(s) with both single and multiple location sites. As an initial study, the overall accuracy of
our proposed algorithm reaches 55.52%, which is quite high to be able to become a promising tool
for further studies.
Availability and implementation: An online web server for our proposed algorithm named MultiP-
SChlo was developed, which are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/multip-
schlo/.
Contact: pandaxiaoxi@gmail.com or gzli@tongji.edu.cn
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction
Chloroplasts are organelles in most green plant cells, and also exists
in some eukaryotic organisms, such as seaweed. Chloroplast’s main
function is to conduct photosynthesis, where they capture and store
the energy from the sunlight, transform it to chemical energy, and
finally release oxygen from water. In addition to the important
photosynthesis, they are also responsible for carrying out a lot of
other functions, including fatty acid synthesis and the immune
response in plants. Chloroplast proteins play different roles in the
biological processes mentioned above. Knowing the functions of
these proteins is of significant value. Due to the very close relation-
ship between the functions and localizations of chloroplast proteins,
identifying the subchloroplast localizations of these proteins in
chloroplast organelle is very helpful for understanding the function
of chloroplast proteins.
With in-depth study of cell organelles, the researchers have
found a number of substructures in cell organelles, such as nuclear
chromatin, heterochromatin, nuclear envelope, nucleolus in the nu-
cleus, inner membrane, outer membrane in the mitochondria,
stroma, thylakoid in the chloroplast and so on. In order to more
deeply understand the function of these proteins, it is necessary to
identify the subchloroplast localizations of these proteins in the or-
ganelle level. As can be seen from the recently released UniProtKB/
Swiss-Prot database (release 2013_05), there are a total of 14 408
V
C
The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 2639
Bioinformatics, 31(16), 2015, 2639–2645
doi: 10.1093/bioinformatics/btv212
Advance Access Publication Date: 20 April 2015
Original Paper
at Tongji University on November 16, 2015http://bioinformatics.oxfordjournals.org/Downloaded from