Genome Biology 2008, 9:R137
Open Access
2008Zhanget al.Volume 9, Issue 9, Article R137
Model-based Analysis of ChIP-Seq (MACS)
Yong Zhang
, Tao Liu
, Clifford A Meyer
, Jérôme Eeckhoute
David S Johnson
, Bradley E Bernstein
, Chad Nusbaum
Richard M Myers
, Myles Brown
, Wei Li
and X Shirley Liu
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, 44
Binney Street, Boston, MA 02115, USA.
Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer
Institute and Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 44 Binney Street, Boston, MA 02115, USA.
Gene Security Network, Inc., 2686 Middlefield Road, Redwood City, CA 94063, USA.
Molecular Pathology Unit and Center for Cancer
Research, Massachusetts General Hospital and Department of Pathology, Harvard Medical School, 13th Street, Charlestown, MA 02129, USA.
Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA, 02142, USA.
Department of Genetics, Stanford University Medical
Center, Stanford, CA 94305, USA.
Division of Biostatistics, Dan L Duncan Cancer Center, Department of Molecular and Cellular Biology,
Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
¤ These authors contributed equally to this work.
Correspondence: Wei Li. Email: wl1@bcm.edu. X Shirley Liu. Email: xsliu@jimmy.harvard.edu
© 2008 Zhang et al.; licensee BioMed Central Ltd.
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ChIP-Seq analysis<p>MACS performs model-based analysis of ChIP-Seq data generated by short read sequencers.</p>
We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short
read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of
ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also
uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for
more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms,
and is freely available.
The determination of the 'cistrome', the genome-wide set of
in vivo cis-elements bound by trans-factors [1], is necessary
to determine the genes that are directly regulated by those
trans-factors. Chromatin immunoprecipitation (ChIP) [2]
coupled with genome tiling microarrays (ChIP-chip) [3,4]
and sequencing (ChIP-Seq) [5-8] have become popular tech-
niques to identify cistromes. Although early ChIP-Seq efforts
were limited by sequencing throughput and cost [2,9], tre-
mendous progress has been achieved in the past year in the
development of next generation massively parallel sequenc-
ing. Tens of millions of short tags (25-50 bases) can now be
simultaneously sequenced at less than 1% the cost of tradi-
tional Sanger sequencing methods. Technologies such as Illu-
mina's Solexa or Applied Biosystems' SOLiD™ have made
ChIP-Seq a practical and potentially superior alternative to
ChIP-chip [5,8].
While providing several advantages over ChIP-chip, such as
less starting material, lower cost, and higher peak resolution,
ChIP-Seq also poses challenges (or opportunities) in the anal-
ysis of data. First, ChIP-Seq tags represent only the ends of
the ChIP fragments, instead of precise protein-DNA binding
sites. Although tag strand information and the approximate
distance to the precise binding site could help improve peak
resolution, a good tag to site distance estimate is often
Published: 17 September 2008
Genome Biology 2008, 9:R137 (doi:10.1186/gb-2008-9-9-r137)
Received: 4 August 2008
Revised: 3 September 2008
Accepted: 17 September 2008
The electronic version of this article is the complete one and can be
found online at http://genomebiology.com/2008/9/9/R137