Published online 09 May 2015 Nucleic Acids Research, 2015, Vol. 43, Web Server issue W65–W71
doi: 10.1093/nar/gkv458
Pse-in-One: a web server for generating various
modes of pseudo components of DNA, RNA, and
protein sequences
Bin Liu
1,2,3,*
, Fule Liu
1
, Xiaolong Wang
1,2
, Junjie Chen
1
, Longyun Fang
1
and
Kuo-Chen Chou
3,4,*
1
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School,
Shenzhen, Guangdong 518055, China,
2
Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute
of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China,
3
Gordon Life Science Institute,
Belmont, MA 02478, USA and
4
Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz
University, Jeddah 21589, Saudi Arabia
Received January 16, 2015; Revised April 26, 2015; Accepted April 27, 2015
ABSTRACT
With the avalanche of biological sequences gener-
ated in the post-genomic age, one of the most chal-
lenging problems in computational biology is how
to effectively formulate the sequence of a biological
sample (such as DNA, RNA or protein) with a dis-
crete model or a vector that can effectively reflect
its sequence pattern information or capture its key
features concerned. Although several web servers
and stand-alone tools were developed to address
this problem, all these tools, however, can only han-
dle one type of samples. Furthermore, the number
of their built-in properties is limited, and hence it
is often difficult for users to formulate the biologi-
cal sequences according to their desired features or
properties. In this article, with a much larger num-
ber of built-in properties, we are to propose a much
more flexible web server called Pse-in-One (http://
bioinformatics.hitsz.edu.cn/Pse-in-One/), which can,
through its 28 different modes, generate nearly all
the possible feature vectors for DNA, RNA and pro-
tein sequences. Particularly, it can also generate
those feature vectors with the properties defined by
users themselves. These feature vectors can be eas-
ily combined with machine-learning algorithms to de-
velop computational predictors and analysis meth-
ods for various tasks in bioinformatics and system
biology. It is anticipated that the Pse-in-One web
server will become a very useful tool in computa-
tional proteomics, genomics, as well as biological
sequence analysis. Moreover, to maximize users’
convenience, its stand-alone version can also be
downloaded from http://bioinformatics.hitsz.edu.cn/
Pse-in-One/download/, and directly run on Windows,
Linux, Unix and Mac OS.
INTRODUCTION
To expedite analyses of increasing number of biological se-
quences, many machine-learning algorithms have been in-
troduced into computational biology. However, nearly all
the existing algorithms can only handle vectors but not se-
quence samples, as elaborated in (1).
However, a vector dened in a discrete model may com-
pletely lose the sequence-order information. To cope with
such a dilemma, the idea of pseudo amino acid composi-
tion or PseAAC (2,3) was proposed. In addition to the well-
known amino acid composition (AAC), PseAAC contains
special terms called ‘pseudo components’. It is through
these terms that the sequence order effects are approxi-
mately reected (2,3).
Ever since it was introduced in 2001, the concept of
PseAAC has rapidly penetrated into almost all the areas of
computational proteomics (see a long list of references cited
in a recent paper (4)).
Encouraged by the successes of using PseAAC to deal
with protein/peptide sequences, the corresponding ap-
proaches were proposed recently to deal with DNA se-
quences (5–7) and RNA sequences (8).
Because this kind of approaches have been widely and in-
creasingly used in many areas of computational biology, a
number of web servers and stand-alone programs were de-
veloped for generating varieties of pseudo components for
DNA sequences (9,10), RNA sequences (8) and protein se-
quences (4,11–13).
*
To whom correspondence should be addressed. Tel: +1 858 484 1018; Fax: +1 858 484 1018; Email: bliu@gordonlifescience.org or bliu@insun.hit.edu.cn
Correspondence may also be addressed to Kuo-Chen Chou. Tel: +1 858 484 1018; Fax: +1 858 484 1018; Email: kcchou@gordonlifescience.org
C
The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
by guest on July 1, 2015http://nar.oxfordjournals.org/Downloaded from