ANNOVAR：高通量测序数据的遗传变异功能注释工具

需积分: 50 40 浏览量更新于2024-09-10 收藏 265KB PDF 举报

"ANNOVAR是用于高通量测序数据的功能注释工具，旨在帮助研究人员从海量遗传变异数据中找出具有功能重要性的变异。这款工具能够对单核苷酸变异（SNVs）和插入/缺失进行注释，包括分析它们对基因的影响、推断细胞遗传带、报告功能重要性评分、寻找保守区域内的变异以及识别1000 Genomes Project和dbSNP数据库中的变异。ANNOVAR可以利用UCSC Genome Browser或其他符合Generic Feature Format标准的注释数据集。" 正文: 《ANNOVAR在高通量测序数据功能注释中的应用》随着高通量测序技术的发展，科研人员得以快速获取大量遗传变异数据，但如何从这些数据中筛选出具有生物学意义的变异仍然是一个挑战。为了解决这一问题，ANNOVAR工具应运而生，它是一款高效的功能注释软件，可帮助研究人员快速鉴定并解析遗传变异的功能特性。 ANNOVAR的主要功能集中在以下几个方面： 1. **单核苷酸变异(SNVs)和插入/缺失(Indels)的注释**：ANNOVAR能够对SNVs和Indels进行精确的定位和分类，如编码区变异、非编码区变异、剪接位点变异等，这有助于理解这些变异可能带来的生物学效应。 2. **基因功能影响分析**：通过对变异与基因结构的关系进行分析，ANNOVAR可以评估变异是否会影响基因的编码序列、启动子、增强子等元件，从而推测其可能对基因表达和功能的影响。 3. **细胞遗传带推断**：通过比对变异位置与细胞遗传带的对应关系，ANNOVAR可以提供变异在染色体上的位置信息，这有助于进一步研究遗传变异与疾病关联的染色体区域。 4. **功能重要性评分**：ANNOVAR可以报告变异的功能重要性评分，例如根据进化保守性、多态性等特征给出评分，帮助研究人员优先考虑那些可能具有更大影响的变异。 5. **保守区域变异检测**：在高度保守的基因区域发现的变异往往更可能对功能有显著影响。ANNOVAR可以找到这些区域内的变异，为后续研究提供线索。 6. **数据库对比**：ANNOVAR能够对接1000 Genomes Project和dbSNP等公共数据库，从而确定已知变异或发现新的罕见变异，这对于遗传疾病的诊断和研究具有重要意义。 7. **灵活的数据源支持**：除了使用UCSC Genome Browser的注释数据，ANNOVAR还支持其他符合Generic Feature Format (GFF)标准的注释集，使得用户可以根据需求选择最合适的注释资源。 ANNOVAR的这些特性使得它成为生物信息学研究中不可或缺的工具，它简化了对大规模遗传变异数据的处理和分析，为研究遗传变异与疾病之间的关系提供了强大的支持。然而，需要注意的是，尽管ANNOVAR提供了丰富的注释信息，但最终的生物学解释还需要结合实验验证和其他生物信息学工具进行综合分析。因此，ANNOVAR在遗传学研究中起着至关重要的辅助作用，是推动基因组学研究向前发展的重要驱动力。

ANNOVAR: functional annotation of genetic variants

from high-throughput sequencing data

Kai Wang

*, Mingyao Li

and Hakon Hakonarson

1,3

Center for Applied Genomics, Children’s Hospital of Philadelphia,

Department of Biostatistics and

Epidemiology and

Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA

Received March 27, 2010; Revised June 2, 2010; Accepted June 18, 2010

ABSTRACT

High-throughput sequencing platforms are genera-

ting massive amounts of genetic variation data for

diverse genomes, but it remains a challenge to

pinpoint a small subset of functionally important

variants. To fill these unmet needs, we developed

the ANNOVAR tool to annotate single nucleotide

variants (SNVs) and insertions/deletions, such as

examining their functional consequence on genes,

inferring cytogenetic bands, reporting functional im-

portance scores, finding variants in conserved

regions, or identifying variants reported in the 1000

Genomes Project and dbSNP. ANNOVAR can utilize

annotation databases from the UCSC Genome

Browser or any annotation data set conforming to

Generic Feature Format version 3 (GFF3). We also

illustrate a ‘variants reduction’ protocol on

4.7 million SNVs and indels from a human genome,

including two causal mutations for Miller syndrome,

a rare recessive disease. Through a stepwise pro-

cedure, we excluded variants that are unlikely to be

causal, and identified 20 candidate genes including

the causal gene. Using a desktop computer,

ANNOVAR requires 4 min to perform gene-based

annotation and 15 min to perform variants reduc-

tion on 4.7 million variants, making it practical to

handle hundreds of human genomes in a day.

ANNOVAR is freely available at http://www.

openbioinformatics.org/annovar/.

INTRODUCTION

High-throughput sequencing data have been produced at

unprecedented rates for diverse genomes. There is a strong

need for novel informatics and analytical strategies,

including methods for sequencing reads alignment,

variant identiﬁcation, genotype calling and association

tests, in order to take advantage of the massive amounts

of sequencing data. There have been dozens of short read

alignment software available now with different function-

alities (1), as well as several single nucleotide variants

(SNV) and copy number variant (CNV) calling algorithms

(2). However, there is a paucity of methods that can sim-

ultaneously handle a large number of called variants (typ-

ically >3 million variants for a given human genome) and

annotate their functional impacts, despite the fact that this

is an important task in many sequencing applications.

Even when sequencing only exonic regions for

Mendelian diseases such as Freeman–Sheldon syndrome,

each subject still carries a total of 20 000 variants, but

only two variants in trans are the true disease causal mu-

tations (3). Therefore, identifying a small subset of func-

tionally important variants from large amounts of

sequencing data is important to pinpoint potential

disease causal genes and causal mutations.

Several reasons motivate us to develop a functional

annotation pipeline for genetic variants. First, although

companies that manufacture sequencing machines or

provide sequencing services typically offer software for

functional annotation, these software are usually

sequencing platform-speciﬁc, and cannot be extended to

handle users’ speciﬁc needs (such as using different

genome builds or gene annotations). Second, although

several databases have been developed for the functional

annotation of SNPs or CNVs (4–6), most of them are

limited to known variants, typically those reported in

dbSNP or CNV databases. We note that some excep-

tions exist (7), for example, the F-SNP tool (8) and

Seattle Seq tool (http://gvs.gs.washington.edu/SeattleSeq

Annotation/) can be used for annotation of novel SNPs.

Third, several previously developed mutation prediction

algorithms, such as SIFT (9) and PolyPhen (10), require

building multiple alignments on sequence databases, can

only handle non-synonymous mutations, and are difﬁcult

to scale up to many model organism genomes.

Nevertheless, for human genomes, SIFT/PolyPhen scores

for all possible non-synonymous mutations can be

*To whom correspondence should be addressed. Tel: +1 215 426 1256; Fax: +1 267 426 0363; Email: kai@openbioinformatics.org

Published online 3 July 2010 Nucleic Acids Research, 2010, Vol. 38, No. 16 e164

doi:10.1093/nar/gkq603

ß The Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/

by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

by guest on June 25, 2015http://nar.oxfordjournals.org/Downloaded from

下载后可阅读完整内容，剩余6页未读，立即下载

qunyingluanfei

粉丝: 0
资源: 1

ANNOVAR：高通量测序数据的遗传变异功能注释工具

idx_annovar.pl

kSNP:kSNP从整个基因组中进行SNP发现和SNP注释-开源

annovar软件

annovar注释图像化

如何使用ANNOVAR 注释snp基因

ANNOVAR 基因注释案例

生信注释软件annovar

程序注释工具

删除VB注释工具

VC6.0 添加批量注释工具

最新资源