Chen et al. BMC Bioinformatics
(2017) 18:315
DOI 10.1186/s12859-017-1725-6
SOFTWARE Open Access
CMSA: a heterogeneous CPU/GPU
computing system for multiple similar
RNA/DNA sequence alignment
Xi Chen, Chen Wang, Shanjiang Tang, Ce Yu
*
and Quan Zou
Abstract
Background: The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in
bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running
time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient
or contain some implicit assumptions that limit the generality of usage. First, the information of users’ sequences,
including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown
before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for
aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further
optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization
on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the
utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously.
Results: This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous
CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users’ submitted
sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices
are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its
center sequence selection process from O(mn
2
) to O(mn). The experimental results show that CMSA achieves an up
to 11× speedup and outperforms the state-of-the-art software.
Conclusion: CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap
based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of
modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run
computation model can maximize the entire system utilization significantly. The source code is available at https://
github.com/wangvsa/CMSA.
Keywords: Heterogeneous, GPU, Multiple sequence alignment (MSA), Center star alignment
Background
Multiple sequence alignment (MSA) refers to the prob-
lem of aligning three or more sequences with or without
inserting gaps between the symbols [1]. It is a fundamental
tool for similar sequences analysis in computational biol-
ogy and molecular function prediction. In computational
molecular biology, similar DNA sequences are aligned
to find out the single nucleotide polymorphism and the
*Correspondence: yuce@tju.edu.cn
School of Computer Science and Technology, Tianjin University, Yaguan Road,
Tianjin, China
copy-number variant, which is the key content in genetics
[2]. In molecular function prediction, large-scale similar
DNA sequence alignment is required when addressing the
evolutionary analysis of bacterial and viral genomes [3].
Therefore, MSA software need to be efficient and scal-
able to handle large-scale datasets, which may contain
hundreds of thousands of similar sequences.
MSA is a problem with an exponential time complex-
ity, it has been proven to be NP-complete [4]. Many
heuristic algorithms are developed and implemented by
previous studies, including Kalign [5], MAFFT [6] and
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.