GPU加速的H-BLAST：生物信息学中蛋白质序列比对的高效工具

需积分: 10 25 浏览量更新于2024-08-26 收藏 578KB PDF 举报

H-BLAST是一种专为在具有GPU的异构计算机上加速蛋白质序列比对而设计的高效工具包。随着生物信息学领域中生物序列数据库的爆炸性增长，传统的比对软件面临着性能提升的需求。为了满足这一挑战，研究人员Weicai Ye、Ying Chen、Yongdong Zhang和Yuesheng Xu，分别来自中山大学数据与计算机科学学院和广东省计算科学重点实验室，以及美国雪城大学数学系名誉教授，共同开发了H-BLAST。 H-BLAST的核心动机是针对生物序列分析中的基础问题——蛋白质序列比对，BLAST作为这个领域中被广泛引用的工具，在过去的二十年里积累了超过118000次引用。随着基因组数据的海量增长，对于处理这些数据的软件来说，提升计算速度至关重要。H-BLAST的设计巧妙地将CPU（中央处理器）和GPU（图形处理器）的计算能力结合起来，实现了并行搜索，特别针对BLASTX和BLASTP这两种NCBI（国家生物技术信息中心）基本的序列比对工具进行优化。该工具的优势在于其利用了GPU的强大并行处理能力，能够在处理大规模数据时显著提高执行效率。通过在异构平台上运行，H-BLAST能够更有效地分配任务，同时充分利用两种处理器的不同优势：CPU负责管理和协调任务，而GPU则以其并行计算核心来执行重复性高的比对运算。这种协同工作模式使得H-BLAST在处理复杂的序列匹配任务时，无论是速度还是性能上都超越了单靠CPU的传统BLAST工具。 H-BLAST的开发过程中，研究人员于2016年8月6日首次提交了论文，并经过了多次修订和审稿，最终于同年12月12日接受发表。这标志着生物信息学领域中一个重要的进步，为基因组学研究者提供了一种更为高效、适应未来数据密集型需求的蛋白质序列比对工具。通过H-BLAST，科学家们能够更快地分析和理解生物序列，推动生物学和医学研究的进展。

Sequence analysis

H-BLAST: a fast protein sequence alignment

toolkit on heterogeneous computers with GPUs

Weicai Ye

, Ying Chen

, Yongdong Zhang

* and Yuesheng Xu

1,2,

School of Data and Computer Science, and Guangdong Province Key Laboratory of Computational Science, Sun

Yat-sen University, Guangzhou 510275, People’s Republic of China and

Professor Emeritus of Department of

Mathematics, Syracuse University, Syracuse, NY 13244, USA

*To whom correspondence should be addressed.

Associate Editor: John Hancock

Received on August 6, 2016; revised on November 7, 2016; editorial decision on November 29, 2016; accepted on December 12, 2016

Abstract

Motivation: The sequence alignment is a fundamental problem in bioinformatics. BLAST is a rou-

tinely used tool for this purpose with over 118 000 citations in the past two decades. As the size of

bio-sequence databases grows exponentially, the computational speed of alignment softwares

must be improved.

Results: We develop the heterogeneous BLAST (H-BLAST), a fast parallel search tool for a hetero-

geneous computer that couples CPUs and GPUs, to accelerate BLASTX and BLASTP—basic tools

of NCBI-BLAST. H-BLAST employs a locally decoupled seed-extension algorithm for better

performance on GPUs, and offers a performance tuning mechanism for better efﬁciency among

various CPUs and GPUs combinations. H-BLAST produces identical alignment results as NCBI-

BLAST and its computational speed is much faster than that of NCBI-BLAST. Speedups achieved

by H-BLAST over sequential NCBI-BLASTP (resp. NCBI-BLASTX) range mostly from 4 to 10 (resp. 5

to 7.2). With 2 CPU threads and 2 GPUs, H-BLAST can be faster than 16-threaded NCBI-BLASTX.

Furthermore, H-BLAST is 1.5–4 times faster than GPU-BLAST.

Availability and Implementation: https://github.com/Yeyke/H-BLAST.git

Contact: yux06@syr.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction

In bioinformatics, the basic local alignment search tool (BLAST)

(Altschul et al., 1990, 1997) is not only a daily used algorithm iden-

tifying regions of local similarity between biological sequences, but

also ‘the principal means by which many other algorithms query

large genomic datasets’ (Loh et al., 2012). Hence, there are fruitful

applications of BLAST in inferring functional (Mackelprang et al.,

2011) and evolutionary (Huang et al., 2014) relationships between

the corresponding organisms. As the next generation sequencing

(NGS) technique advances, the size of sequence databases has expo-

nentially grown (Daniels et al., 2013) and the search speed of

BLAST is insufficient. Therefore, sequence alignment with BLAST

became a major bottleneck. To solve this problem, a number of

tools were developed in the literature.

There are two categories of methods to accelerate BLAST search-

ing against protein databases. Methods of category one change

indexing targets from query as in BLAST to database, and modify

the non-exact match seeding strategy with a reduced amino-acid al-

phabet and spaced seeds. Commonly used software tools of this cat-

egory include BLAT (Kent, 2002), USEARCH (Edgar, 2010),

RAPSearch2 (Zhao et al., 2012) and DIAMOND (Buchfink et al.,

2015). Methods of another category are parallel implementation on

various specific hardware, such as FPGAs (field-programmable gate

arrays) (Fei et al., 2008; Herbordt et al., 2006; Wienbrandta et al.,

2011), Cell Broadband Engines (Zhang et al., 2008), multi-core

CPUs (Camacho et al., 2008), graphic processing units (GPUs) only

(Cheng and Benkridb, 2010; Suzuki et al., 2012), heterogeneous

computers with GPUs (Liu et al.; Liu et al., 2011; Vouzis and

Bioinformatics, 33(8), 2017, 1130–1138

doi: 10.1093/bioinformatics/btw769

Advance Access Publication Date: 12 January 2017

Original Paper

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38690079

粉丝: 2

GPU加速的H-BLAST：生物信息学中蛋白质序列比对的高效工具

H-BLAST:异构BLAST（H-BLAST），一种用于将CPU和GPU耦合的异构计算机的快速并行搜索工具，以加速BLASTX和BLASTP – NCBI-BLAST的基本工具

Primer-BLAST：NCBI的引物设计和特异性检验工具

absolute-testicle-blast::collision:

gulp-font-blast：食尸鬼font-blast

biojs-io-blast:BLAST解析器

gotta-blast：BLAST实现

biojs-vis-blast:用于可视化 BLAST 结果的 BioJS 组件

osg-blast:OSG的分布式blast执行脚本

Bowsers-Big-Blast：Java版本的Mario Party 2游戏：Bowser's Big Blast

glittering-blast:一个简单的自上而下的射击游戏

最新资源