模拟退火与星形比对共有序列搜索的多序列比对算法研究

174 浏览量更新于2024-08-28 1 收藏 392KB PDF 举报

"这篇研究论文提出了一种新的多序列比对算法，该算法结合了模拟退火法和星形比对技术，旨在高效地搜索共有序列，从而优化多序列比对过程。由北京联合大学、北京石油化工学院以及清华大学的研究人员共同完成，他们探讨了如何利用这两种方法来改进生物信息学中的序列分析。" 正文: 序列比对是生物信息学中的核心任务，它用于比较和分析不同生物序列之间的相似性，以便理解基因家族的特征。共有序列搜索是这一过程的关键步骤，因为它可以揭示序列间的共享模式，简化后续的多序列比对。本文提出的算法将模拟退火策略与星形比对相结合，旨在提高比对效率和准确性。模拟退火是一种全局优化算法，来源于固体物理中的退火过程，它通过在搜索空间中引入随机性，避免陷入局部最优解，从而有更高的概率找到全局最优解。在序列比对中，模拟退火可以用于探索不同的序列排列组合，以找到最优化的比对方案。星形比对是一种简化版的多序列比对，它首先将所有序列与一个中心参考序列进行比对，然后使用这些比对结果作为基础构建整个序列集的比对。这种方法相对简单且计算效率高，但可能丢失部分复杂结构的信息。论文中，研究者将星形比对与模拟退火结合，通过星形比对快速生成初步的共有序列，再用模拟退火来优化这个序列，确保了既保留了全局最优的可能性，又减少了计算复杂度。通过这种方法，论文的作者们可能解决了传统多序列比对算法中存在的问题，如计算量大、易陷入局部最优等。他们可能还对算法的性能进行了评估，对比了与现有流行算法（如ClustalW、MUSCLE等）的差异，并分析了在不同序列长度和复杂性下的表现。此外，他们可能还讨论了该算法在生物信息学应用中的潜力，如进化分析、基因功能预测和疾病关联研究等。这篇研究论文展示了模拟退火和星形比对的结合如何能为共有序列搜索和多序列比对提供一种新颖且高效的解决方案。这种算法的提出不仅有助于生物信息学家更准确地分析大量生物序列数据，还可能为未来的算法设计和优化提供有价值的思路。

An algorithm of multiple sequence alignment based

on consensus sequence searched by simulated

annealing and star alignment

Dengfeng Yao/ Beijing Union University

Beijing Key Lab of Information Service Eng.,

Beijing Union University, Beijing, China

e-mail: yaodengfeng@gmail.com

Xu You/ Beijing Institute of Petro-Chemical

Technology

Department of Mathematics and Physics, Beijing

Institute of Petro-Chemical Technology, Beijing,

P.R. China

e-mail: youxu@bipt.edu.cn

Abudoukelimu.abulizi /Tsinghua University

Lab of Computational Linguistics, School of

Humanities, Tsinghua University, Beijing, China

e-mail:keram1106@163.com

Renkui Hou/Tsinghua University

Lab of Computational Linguistics, School of

Humanities, Tsinghua University, Beijing, China

e-mail: hourk0917@163.com

Abstract—Sequence alignment is an important

method of sequence analysis in biological

informatics, by which the characteristics of a gene

family can be conveniently determined through

consensus sequence. Finding a consensus sequence

prior to multiple sequence alignment makes the

sequence alignment easier. Thus, the consensus

sequence should be produced by simulated

annealing and star alignment algorithms.

Furthermore, the star alignment can be used to

compare each sequence in the consensus sequence.

Keywords—multiple sequence alignment,

simulated annealing, consensus sequence, star

alignment

1. INTRODUCTION

Consensus sequence is a method of

presenting multiple sequence alignment results

and merging highly similar sequence fragments.

The overall characteristics of a gene family and

the common elements in a sequence family can

be conveniently determined from the consensus

sequence.

However, a problem closely related to

consensus sequence is the construction of

multiple sequence alignment, which is an

important component of modern biological

information. Multiple sequence alignment

promotes many important applications in DNA

detection, structure function, and RNA and

protein family evolution and relation. Multiple

sequence alignment methods include the

Needleman–Wunsch algorithm [1],

Carrillo–Lipman algorithm [2], precise alignment

algorithm, iterative alignment algorithms such as

CLUSTAW [3], algorithm based on progressive

alignment, and SAGA [4] based on genetic

algorithm. Although these methods have

achieved good results, many of these still require

improvement and are yet to be perfect algorithms

for multiple sequence alignment. Hence, the

biological significance of alignment needs

continuous improvement.

In addition to using simulated annealing

method and star alignment algorithm to construct

the consensus sequence, this research also

conducted multiple sequence alignment

experiments based on the consensus sequence.

Once the consensus sequence was found, the star

alignment became easy to use to achieve multiple

sequence alignment based on the consensus

sequence [6]. Therefore, this research mainly

investigates the consensus sequence.

Day and McMorris (1993) [7] proved that

finding the consensus sequence is a complete NP

problem. Two main methods are used to

determine the consensus sequence. First, the

traditional method extracts elements in the

aligned column to obtain the consensus sequence

after multiple sequence alignment. The second

method searches for the consensus sequence in

the solution space based on simulation annealing

[5]. This method gradually generates a sequence

with a minimum distance of the input sequence

set from an empty sequence, and the consensus

sequence is obtained in the final annealing. These

two methods are performed from different

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38628552

粉丝: 3
资源: 907

模拟退火与星形比对共有序列搜索的多序列比对算法研究

大数据-算法-模拟退火遗传算法在生物多序列比对中的应用研究.pdf

基于GPU的遗传退火多序列比对并行研究.pdf

多序列比对问题的粒子群优化算法求解.pdf

遗传算法+遗传退火算法（算法简介+编程技巧+工具箱+应用大全）（含源代码）

基于DNA计算的图像模板匹配算法 (2013年)

多重序列比对的数学模型与模拟退火算法

进化算法优化DNA序列比对

PrimerBlast脚本：检索两基因组引物序列间序列

多目标微粒群优化算法进展及应用探索

生物信息学中的模拟退火算法：序列比对与基因组分析的利器

最新资源