CMSA：异构CPU-GPU系统，加速多相似RNA/DNA序列对齐

下载需积分: 9 | PDF格式 | 839KB | 更新于2024-09-08 | 85 浏览量 | 举报

"CMSA是一个异构的CPU-GPU计算系统，专用于多个相似RNA-DNA序列的对齐。" 在生物信息学中，多序列比对（Multiple Sequence Alignment，MSA）是一种经典且强大的序列分析技术。随着生物数据集的快速增长，MSA的并行化处理变得至关重要，以确保其运行时间保持在可接受的范围内。尽管已经有许多针对MSA问题的研究，但它们的方法要么不充分，要么包含限制通用性的隐含假设。首先，用户序列的信息，包括数据集的大小和序列的长度，可以是任意值，并且通常在提交前未知。然而，之前的工作往往忽视了这一关键点。这对MSA的优化和性能预测提出了挑战，因为算法需要能够适应不同规模和长度的输入序列。其次，中心星策略适用于相似序列的比对。该策略的第一步是选择中心序列，这一步骤非常耗时，需要进一步的优化。优化中心序列选择过程不仅可以提高比对效率，还能降低计算资源的消耗。考虑到当前的异构CPU-GPU平台，CMSA系统充分利用了这两种硬件资源的优势。CPU擅长处理复杂的控制流和数据多样性，而GPU则在执行大量并行计算任务时表现出色。通过将计算任务智能地分配到CPU和GPU之间，CMSA能够实现高效的序列比对并行化，从而显著提升处理速度。此外，CMSA可能采用了动态规划方法，如Smith-Waterman或Needleman-Wunsch算法的并行版本，来处理序列比对。这些算法能够在大量序列间找到最佳的配对方式，同时考虑了进化距离和序列间的相似性。为了适应未知的序列大小和数量，CMSA可能还包括一种自适应的内存管理和任务调度机制，以确保高效利用硬件资源。 CMSA系统针对生物信息学中的多序列比对问题提出了一种创新的解决方案，它不仅考虑了用户序列的不确定性，还优化了中心序列选择步骤，并利用了异构计算平台的特性。通过这些策略，CMSA旨在提供一个灵活、高效且适应性强的工具，以应对日益增长的生物序列分析需求。

Chen et al. BMC Bioinformatics

(2017) 18:315

DOI 10.1186/s12859-017-1725-6

SOFTWARE Open Access

CMSA: a heterogeneous CPU/GPU

computing system for multiple similar

RNA/DNA sequence alignment

Xi Chen, Chen Wang, Shanjiang Tang, Ce Yu

and Quan Zou

Abstract

Background: The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in

bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running

time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient

or contain some implicit assumptions that limit the generality of usage. First, the information of users’ sequences,

including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown

before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for

aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further

optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization

on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the

utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously.

Results: This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous

CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users’ submitted

sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices

are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its

center sequence selection process from O(mn

) to O(mn). The experimental results show that CMSA achieves an up

to 11× speedup and outperforms the state-of-the-art software.

Conclusion: CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap

based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of

modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run

computation model can maximize the entire system utilization significantly. The source code is available at https://

github.com/wangvsa/CMSA.

Keywords: Heterogeneous, GPU, Multiple sequence alignment (MSA), Center star alignment

Background

Multiple sequence alignment (MSA) refers to the prob-

lem of aligning three or more sequences with or without

inserting gaps between the symbols [1]. It is a fundamental

tool for similar sequences analysis in computational biol-

ogy and molecular function prediction. In computational

molecular biology, similar DNA sequences are aligned

to find out the single nucleotide polymorphism and the

*Correspondence: yuce@tju.edu.cn

School of Computer Science and Technology, Tianjin University, Yaguan Road,

Tianjin, China

copy-number variant, which is the key content in genetics

[2]. In molecular function prediction, large-scale similar

DNA sequence alignment is required when addressing the

evolutionary analysis of bacterial and viral genomes [3].

Therefore, MSA software need to be efficient and scal-

able to handle large-scale datasets, which may contain

hundreds of thousands of similar sequences.

MSA is a problem with an exponential time complex-

ity, it has been proven to be NP-complete [4]. Many

heuristic algorithms are developed and implemented by

previous studies, including Kalign [5], MAFFT [6] and

International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and

reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the

Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver

(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

下载后可阅读完整内容，剩余9页未读，立即下载

kfjztb

粉丝: 3

CMSA：异构CPU-GPU系统，加速多相似RNA/DNA序列对齐

苹果cms手机版影院视频网站源码.zip_seasonafy_影视_苹果CMSa片电影_苹果cmsa大片_苹果cmsa影院

CMSA/CA退避算法增强

MIPI APHY协议规格书

CMSA/CD工作原理

payload cms搭建公司网站

24c62换成24c64

用stc8写24c62

CSMACD模拟

clanguage:“ C”是计算机科学（CMSA）的母语

BananaPanel CMS-开源

最新资源