没有合适的资源?快使用搜索试试~ 我知道了~
首页并行计算基因组数据的Burrow-Wheeler变换优化
并行计算基因组数据的Burrow-Wheeler变换优化
0 下载量 26 浏览量
更新于2024-08-26
收藏 136KB PDF 举报
"这篇研究论文探讨了在多核计算机上实现基因组数据的并行Burrow-Wheeler变换(BWT)算法。BWT作为FM索引和压缩后缀数组的核心,是生物信息学中对基因序列进行模式搜索的关键技术。它能在O(n)位的空间复杂度下运行,通常所需的总空间小于或等于原始序列本身,且在精确模式匹配中表现出高效性能。然而,从序列中计算BWT的传统方法在时间和空间效率上存在挑战。 在该文章中,作者Zhiheng ZHAO、Jianping YIN、Wei XIONG和Jun LONG来自中国国防科技大学计算机学院,他们提出了一种基于增量思想的高效并行算法,该算法在空间效率方面有所提升。分析和实验表明,他们的算法在多核计算机上运行非常高效,并具有良好的可扩展性,特别是对于处理长序列时。实验证明,该算法仅需几分钟即可在常见的四核桌面计算机上完成人类全基因组的BWT计算。 Burrow-Wheeler变换是一种文本预处理技术,通过排列字符形成一个新的矩阵,然后将矩阵最后一列作为BWT输出。在并行计算环境下,这个过程被分解为多个任务,分配到不同的核心上执行,从而提高了计算速度。这种并行化策略使得大型基因组数据的处理变得更加可行,对于生物信息学中的大规模序列分析有着重要的应用价值。 文章详细介绍了算法的设计原理,包括如何在并行环境中有效地分配和协调工作负载,以及如何利用多核架构来优化计算流程。此外,实验部分还提供了性能评估和对比,可能包括与其他现有算法的比较,以证明新算法的优越性。这些结果对于理解和优化基因组数据处理的计算效率具有指导意义,对于开发更高效的生物信息学工具具有深远影响。"
资源详情
资源推荐
Journal of Computational Information Systems 8: 16 (2012) 6935–6942
Available at http://www.Jofcis.com
Parallel Burrow-wheeler Transform for Genomic Data on
Multi-core Computers
⋆
Zhiheng ZHAO
∗
, Jianping YIN, Wei XIONG, Jun LONG
School of Computer, National University of Defense Technology, Changsha 410073, China
Abstract
As the core of FM-index and compressed suffix array, the Burrows-Wheeler Transform (BWT) plays
a key role in indexing genomic sequence data for pattern search. It can run in O(n) bits, typically in
total space less than or equal to that of the sequences themselves, and is high efficient in exact pattern
matching. However, computing a BWT from a sequence in efficient space and time is challenging. In
this article, we present an efficient parallel algorithm for computing BWT on multi-core computers. Our
parallel algorithm is based on an incremental idea which is space-efficient. The analysis and experiments
show that our algorithm is very efficient on multi-core computers and has good scalability, especially for
long sequences. It only takes about a few minutes to compute the BWT of human whole genome on a
prevalent 4-core desktop computer.
Keywords: Burrows-wheeler Transform; Parallel Algorithm; Pattern Search; Multi-core
1 Introduction
During the last decade the amount of genomic sequence data has increased exponentially and
this tend will certainly continue for the next few years. This poses challenges for more efficient
algorithms and applications to analyze these data. Searching Patterns in genomic data is an
important step in sequence analysis. To speed up the search process in large genomic sequence
data, time-efficient and space-efficient data structure must be required to index these sequences.
Suffix tree and suffix array have been widely used in pattern search algorithms [1][2], for they
can locate a pattern efficiently in O(k) time and O(klog(n)) time, where k and n denote the
length of the patten and the indexed sequence, respectively. However, suffix tree and suffix array
require O(nlog(n)) bits, several times the space of the indexed sequences themselves. To index
the human genome which contains 2.9 Giga bases, suffix tree needs over 50 Giga bytes and suffix
array needs over 12 Giga bytes in practice, which far exceed the capacity of a PC nowadays.
Hence, space-efficient data structures become more important than time-efficient ones for large
⋆
Project supported by the National Nature Science Foundation of China (Project No. 61105050, 61170287 and
60970034).
∗
Corresponding author.
Email address: wader zzh@hotmail.com (Zhiheng ZHAO).
1553–9105 / Copyright © 2012 Binary Information Press
August 15, 2012
下载后可阅读完整内容,剩余7页未读,立即下载
weixin_38614112
- 粉丝: 3
- 资源: 930
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 十种常见电感线圈电感量计算公式详解
- 军用车辆:CAN总线的集成与优势
- CAN总线在汽车智能换档系统中的作用与实现
- CAN总线数据超载问题及解决策略
- 汽车车身系统CAN总线设计与应用
- SAP企业需求深度剖析:财务会计与供应链的关键流程与改进策略
- CAN总线在发动机电控系统中的通信设计实践
- Spring与iBATIS整合:快速开发与比较分析
- CAN总线驱动的整车管理系统硬件设计详解
- CAN总线通讯智能节点设计与实现
- DSP实现电动汽车CAN总线通讯技术
- CAN协议网关设计:自动位速率检测与互连
- Xcode免证书调试iPad程序开发指南
- 分布式数据库查询优化算法探讨
- Win7安装VC++6.0完全指南:解决兼容性与Office冲突
- MFC实现学生信息管理系统:登录与数据库操作
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功