没有合适的资源?快使用搜索试试~ 我知道了~
首页iSeeRNA:利用SVM算法高效识别转录组测序数据中的长非编码RNA
iSeeRNA:利用SVM算法高效识别转录组测序数据中的长非编码RNA
0 下载量 137 浏览量
更新于2024-08-27
收藏 1.73MB PDF 举报
iSeeRNA是一项重要的研究,旨在解决从转录组测序数据中识别长间隔非编码RNA(lincRNAs)这一挑战性问题。lincRNAs是一类新兴的非编码RNA,它们作为强大的基因调控因子,对生物学过程起着关键作用。随着高通量RNA测序技术的发展,通过组装新发现的转录本成为可能,但如何从众多组装的转录片段中准确区分lincRNAs与蛋白质编码转录本(PCTs)仍然是一个亟待解决的问题。 这项研究的成果是iSeeRNA,一个基于支持向量机(SVM)的分类器。SVM是一种机器学习算法,以其在模式识别和分类任务中的高效性能而著称。iSeeRNA的设计目的是利用其高级的特征选择和分类能力,有效地识别那些在序列特性上与PCTs有显著差异的lincRNAs。 iSeeRNA的优势在于它展示出优于其他现有软件的预测性能。其核心在于利用深度学习技术和统计分析方法,能够更精确地识别那些在基因组上的位置特征、剪接模式以及转录本长度等方面与lincRNAs更为匹配的候选转录片段。此外,为了方便用户特别是研究者们使用,研究团队还开发了一个公共的在线服务器,即使对于小型数据集,也能提供便捷的服务。 该研究的重要结论是,iSeeRNA不仅具有很高的预测准确性,而且运行速度显著快于同类其他程序。这意味着它能够在大规模的转录组数据分析中节省时间和计算资源,极大地提高了lincRNA研究的效率。整合到现有的生物信息学工作流程中,iSeeRNA可以成为研究人员识别和理解lincRNA功能的重要工具,推动了非编码RNA领域尤其是lincRNA研究的前沿进展。
资源详情
资源推荐
RESEARCH Open Access
iSeeRNA: identification of long intergenic
non-coding RNA transcripts from
transcriptome sequencing data
Kun Sun
1,2
, Xiaona Chen
1,3
, Peiyong Jiang
1,2
, Xiaofeng Song
4*
, Huating Wang
1,3*
, Hao Sun
1,2*
From ISCB-Asia 2012
Shenzhen, China. 17-19 December 2012
Abstract
Background: Long intergenic non-coding RNAs (lincRNAs) are emerging as a novel class of non-coding RNAs and
potent gene regulators. High-throughput RNA-sequencing combined with de novo assembly promises quantity
discovery of novel transcripts. However, the identification of lincRNAs from thousands of assembled transcripts is
still challenging du e to the difficulties of separating them from protein coding transcripts (PCTs).
Results: We have implemented iSeeRNA, a support vector machine (SVM)-based classifier for the identification of
lincRNAs. iSeeRNA shows better performance compared to other software. A public avai lable webserver for
iSeeRNA is also provided for small size dataset.
Conclusions: iSeeRNA demonstrates high pre diction accura cy and runs several magnitudes faster than other
similar programs. It can be integrated into the transcriptome data analysis pipelines or run as a web server, thus
offering a valuable tool for lincRNA study.
Background
Over the past decade, e vidence from numerous high-
throughput genomic platforms reveals that even though
less than 2% of the mammalian genome encodes proteins,
a significant fracti on can be transcribed into different
complex families of non-coding RNAs (ncRNAs) [1-4].
Other than microRNAs and other families of small non-
coding RNAs, long non-coding RNAs (lncRNAs, >200nt)
are emerging as potent regulators of gene expression [5].
Originally identified by Guttman et al. [6] from four
mouse cell types using chromatin state maps as a subtype
of lncRNAs, long intergenic non-coding RNAs (lincRNAs),
are discrete transcriptional unit intervening known pro-
tein-coding loci. Recent studies demonstrate the functional
significance of lincRNAs. However, it remains a daunting
task to identify all the lincRNAs existent in various biolo-
gical processes and systems.
Whole transcriptome sequencing , known as RNA- Seq,
offers the promise of rapid comprehensive discovery of
novel genes and transcripts [7]. With the de novo assembly
software such as Cufflinks [8] and Scripture [6], a large set
of novel assemblies can be obtained from RNA-Seq data.
Several programs have been used to facilitate the catalo-
ging of lincRNAs from RNA-Seq assemblies. For example,
Li et al. [9] used Codon Substitution Frequency (CSF)
score [10] to identify lincRNAs from de novo assembled
transcripts in chicken skeletal muscle. Pauli et al. [11] took
advantage of PhyloCSF score [12] followed by other filter-
ing steps to identify lincRNAs expressed during zebrafish
embryogenesis. Cabili et al. [13] also use d PhyloCSF pro-
gram to eliminate the de novo assembled transcripts with
positive coding potential and identified ~8200 lincRNA
loci in 24 human tissues. However, the extremely high
computational times demanded by PhyloCSF, may become
the bottleneck for handling millions of assemblies gener-
ated from high throughput sequencing. Furthermore,
* Correspondence: xfsong@nuaa.edu.cn; huating.wang@cuhk.edu.hk;
haosun@cuhk.edu.hk
1
Li Ka Shing Institute of Health Sciences, The Chinese University of Hong
Kong, Shatin, New Territories, Hong Kong SAR, China
4
Department of Biomedical Engineering, Nanjing University of Aeronautics
and Astronautics, Nanjing 210016, China
Full list of author information is available at the end of the article
Sun et al. BMC Genomics 2013, 14(Suppl 2):S7
http://www.biomedcentral.com/1471-2164/14/S2/S7
© 2013 Sun et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
下载后可阅读完整内容,剩余9页未读,立即下载
weixin_38711041
- 粉丝: 6
- 资源: 954
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- C++标准程序库:权威指南
- Java解惑:奇数判断误区与改进方法
- C++编程必读:20种设计模式详解与实战
- LM3S8962微控制器数据手册
- 51单片机C语言实战教程:从入门到精通
- Spring3.0权威指南:JavaEE6实战
- Win32多线程程序设计详解
- Lucene2.9.1开发全攻略:从环境配置到索引创建
- 内存虚拟硬盘技术:提升电脑速度的秘密武器
- Java操作数据库:保存与显示图片到数据库及页面
- ISO14001:2004环境管理体系要求详解
- ShopExV4.8二次开发详解
- 企业形象与产品推广一站式网站建设技术方案揭秘
- Shopex二次开发:触发器与控制器重定向技术详解
- FPGA开发实战指南:创新设计与进阶技巧
- ShopExV4.8二次开发入门:解决升级问题与功能扩展
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功