Vol. 23 no. 9 2007, pages 1141–1147
BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm045
Data and text mining
Discovery of microRNA–mRNA modules via population-based
probabilistic learning
Je-Gun Joung
1
, Kyu-Baek Hwang
2
, Jin-Wu Nam
1
, Soo-Jin Kim
1
and
Byoung-Tak Zhang
1,3,
1
Center for Bioinformation Technology, Seoul National University, Seoul 151-742,
2
School of Computing,
Soongsil University, Seoul 156-743 and
3
School of Computer Science and Engineering, Seoul National University,
Seoul 151-742, Korea
Received on October 30, 2006; revised on December 15, 2006; accepted on February 4, 2007
Advance Access publication March 9, 2007
Associate Editor: Satoru Miyano
ABSTRACT
Motivation: MicroRNAs (miRNAs) and mRNAs constitute an
important part of gene regulatory networks, influencing diverse
biological phenomena. Elucidating closely related miRNAs and
mRNAs can be an essential first step towards the discovery of
their combinatorial effects on different cellular states. Here, we
propose a probabilistic learning method to identify synergistic
miRNAs involving regulation of their condition-specific target genes
(mRNAs) from multiple information sources, i.e. computationally
predicted target genes of miRNAs and their respective expression
profiles.
Results: We used data sets consisting of miRNA–target gene
binding information and expression profiles of miRNAs and mRNAs
on human cancer samples. Our method allowed us to detect
functionally correlated miRNA–mRNA modules involved in specific
biological processes from multiple data sources by using a balanced
fitness function and efficient searching over multiple populations.
The proposed algorithm found two miRNA–mRNA modules, highly
correlated with respect to their expression and biological function.
Moreover, the mRNAs included in the same module showed much
higher correlations when the related miRNAs were highly expressed,
demonstrating our method’s ability for finding coherent miRNA–
mRNA modules. Most members of these modules have been
reported to be closely related with cancer. Consequently, our
method can provide a primary source of miRNA and target sets
presumed to constitute closely related parts of gene regulatory
pathways.
Contact: btzhang@bi.snu.ac.kr
Supplementary information: Supplementary data are available at
Bioinformatics online.
1 INTRODUCTION
MicroRNAs (miRNAs) are a class of small endogenous RNA
molecules (22 nt), which are presumed to participate in the
developmental control of gene expression (Bartel et al., 2004).
They can suppress their target genes (mRNAs)
posttranscriptionally by complementary base pairing. Hence,
miRNAs are related to diverse cellular processes and regarded
as important components of the gene regulatory network.
Researchers have tried to elucidate the function of miRNAs
in cellular processes using experimental and computational
approaches (Denli et al., 2004; Han et al., 2006; Thomson et al.,
2004). Early efforts in this area mainly focused on the
identification of miRNAs and their targets (Lewis et al.,
2005; Nam et al., 2006). Expression profiling techniques were
also deployed for characterizing differentially expressed
miRNAs according to cellular states and environmental
conditions (Liu et al., 2004, 2005; Thomson et al., 2004).
Correspondingly, significant amounts of data on miRNAs have
now accumulated (Griffiths-Jones et al., 2006).
To understand the regulatory mechanism of miRNAs in
complex cellular systems, it is important to identify the
functional modules involved in complex interactions between
miRNAs and their targets. Previously, the concept of miRNA
regulatory modules (MRMs) was introduced by Yoon and
De Micheli (2005). Their modules are related to only miRNA–
mRNA duplexes in the sequence level without considering their
expression profiles. Additional information on the expression
profiles of miRNAs and mRNAs could be useful to detect the
actual MRMs in specific biological processes. Recently,
integrated analysis of targeting information and expression
profiles was trialed to discover functional miRNA targets
(Huang et al., 2006; Zilberstein et al., 2006). They reported that
the utilization of expression profiles could help identify targets
with high confidence.
Here we propose a population-based probabilistic method
to identify coherent miRNA–mRNA modules by integrating
heterogeneous information, i.e. computationally predicted
target genes of miRNAs and two respective expression profiles
of mRNAs and miRNAs. Here, miRNA–mRNA modules
are defined as groups of miRNAs and their target mRNAs
involved in similar biological processes. In our approach,
a module consists of highly related miRNAs and their targets,
which can be thought to have similar biological functions.
Our main idea is to combine multiple information sources to
extract common patterns among them, and to minimize noise
and errors in each information source. Figure 1 illustrates our
*To whom correspondence should be addressed.
ß The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 1141