MACS：基于模型的ChIP-Seq分析方法

下载需积分: 50 | PDF格式 | 292KB | 更新于2024-09-09 | 175 浏览量 | 举报

"Model-based Analysis of ChIP-Seq (MACS)" 是一种用于分析染色质免疫共沉淀测序（ChIP-Seq）数据的统计方法，旨在高效准确地识别DNA序列上的结合位点。这项技术由Yong Zhang等人在2008年发表于《Genome Biology》杂志上，文章编号为R137。 MACS是一种基于模型的分析方法，其核心目标是利用ChIP-Seq产生的高通量序列数据来定位蛋白质-DNA相互作用的精确位置。ChIP-Seq技术通过结合特定蛋白质（如转录因子或组蛋白修饰酶）与DNA，然后对捕获的片段进行测序，从而揭示这些蛋白质在基因组中的结合模式。MACS算法考虑了ChIP-Seq数据的特点，如峰的形状、测序深度和噪声水平，以提高定位信号峰的准确性和可靠性。该方法的关键步骤包括： 1. **数据预处理**：首先，MACS会去除低质量的序列读取，并将剩下的序列映射到参考基因组上，以确定它们的精确位置。 2. **峰检测**：MACS采用一个滑动窗口策略，比较相邻区域的信号强度，寻找显著高于背景的区域，这些区域可能对应于蛋白质的结合位点。它使用一个动态建模过程来适应不同峰的形状和大小。 3. **峰呼叫**：MACS通过比较处理后的信号与随机模拟的背景信号，计算每个候选峰的p值，以评估其显著性。它还利用一种称为“广义泊松混合模型”的统计模型来区分真实信号和噪声。 4. **峰定位和宽度估计**：MACS通过优化峰的边界来精确确定峰的位置，并估计峰的宽度，这有助于理解蛋白质结合的特异性。 5. **富集区域的评估和注释**：最后，识别出的峰会被与基因组特征（如启动子、增强子、基因座等）关联，以理解蛋白质结合的生物学意义。 MACS的优势在于其能够处理大规模的ChIP-Seq数据，同时提供了一种定量的方法来评估结合位点的显著性。此外，MACS2，作为MACS的更新版本，引入了更多的改进，如支持多因素分析、增加了峰合并和分割功能，以及优化了计算性能。在实际应用中，MACS已被广泛用于研究各种生物过程，如转录因子的调控网络、组蛋白修饰模式以及DNA甲基化的分布等。通过MACS分析，科学家可以深入理解基因表达调控和表观遗传学的复杂性，为疾病研究和药物发现提供了强大的工具。

展开

Genome Biology 2008, 9:R137

Open Access

2008Zhanget al.Volume 9, Issue 9, Article R137

Method

Model-based Analysis of ChIP-Seq (MACS)

Yong Zhang

, Tao Liu

, Clifford A Meyer

, Jérôme Eeckhoute

†

David S Johnson

‡

, Bradley E Bernstein

§¶

, Chad Nusbaum

Richard M Myers

, Myles Brown

†

, Wei Li

and X Shirley Liu

Addresses:

Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, 44

Binney Street, Boston, MA 02115, USA.

†

Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer

Institute and Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 44 Binney Street, Boston, MA 02115, USA.

‡

Gene Security Network, Inc., 2686 Middlefield Road, Redwood City, CA 94063, USA.

Molecular Pathology Unit and Center for Cancer

Research, Massachusetts General Hospital and Department of Pathology, Harvard Medical School, 13th Street, Charlestown, MA 02129, USA.

Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA, 02142, USA.

Department of Genetics, Stanford University Medical

Center, Stanford, CA 94305, USA.

Division of Biostatistics, Dan L Duncan Cancer Center, Department of Molecular and Cellular Biology,

Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.

¤ These authors contributed equally to this work.

Correspondence: Wei Li. Email: wl1@bcm.edu. X Shirley Liu. Email: xsliu@jimmy.harvard.edu

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which

permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

ChIP-Seq analysis<p>MACS performs model-based analysis of ChIP-Seq data generated by short read sequencers.</p>

Abstract

We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short

read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of

ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also

uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for

more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms,

and is freely available.

Background

The determination of the 'cistrome', the genome-wide set of

in vivo cis-elements bound by trans-factors [1], is necessary

to determine the genes that are directly regulated by those

trans-factors. Chromatin immunoprecipitation (ChIP) [2]

coupled with genome tiling microarrays (ChIP-chip) [3,4]

and sequencing (ChIP-Seq) [5-8] have become popular tech-

niques to identify cistromes. Although early ChIP-Seq efforts

were limited by sequencing throughput and cost [2,9], tre-

mendous progress has been achieved in the past year in the

development of next generation massively parallel sequenc-

ing. Tens of millions of short tags (25-50 bases) can now be

simultaneously sequenced at less than 1% the cost of tradi-

tional Sanger sequencing methods. Technologies such as Illu-

mina's Solexa or Applied Biosystems' SOLiD™ have made

ChIP-Seq a practical and potentially superior alternative to

ChIP-chip [5,8].

While providing several advantages over ChIP-chip, such as

less starting material, lower cost, and higher peak resolution,

ChIP-Seq also poses challenges (or opportunities) in the anal-

ysis of data. First, ChIP-Seq tags represent only the ends of

the ChIP fragments, instead of precise protein-DNA binding

sites. Although tag strand information and the approximate

distance to the precise binding site could help improve peak

resolution, a good tag to site distance estimate is often

Published: 17 September 2008

Genome Biology 2008, 9:R137 (doi:10.1186/gb-2008-9-9-r137)

Received: 4 August 2008

Revised: 3 September 2008

Accepted: 17 September 2008

The electronic version of this article is the complete one and can be

found online at http://genomebiology.com/2008/9/9/R137

下载后可阅读完整内容，剩余8页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

biobamboo

粉丝: 0

MACS：基于模型的ChIP-Seq分析方法

MACS V6.5.2软件安装包简介.pdf

ABB常用机器人技术参数.pdf

西门子1200 PLC FB284功能块实现多设备控制：V90伺服、相机角度调整及FANUC机器人DP通讯

《计算机常用工具软件(第3版)》第6章--图形图像工具.ppt

未来产业全球未来产业新赛道布局与发展策略分析：涵盖人工智能、量子科技、氢能等关键技术领域

《网络设备安装与调试(神码版)》2交换机的配置.pptx

自动驾驶路径规划:Lattice算法中的参考线、Frenet坐标系及多项式拟合的Matlab与C++实现

《网络操作系统(Linux)》项目4-磁盘管理.pptx

《计算机应用基础实训指导》实训十八-PowerPoint-2010的动画和切换.pptx

安川机器人DX100使用说明书.1.pdf

最新资源