Yardımcı and Noble Genome Biology
(2017) 18:26
DOI 10.1186/s13059-017-1161-y
REVIE W Open Access
Software tools for visualizing Hi-C data
Galip Gürkan Yardımcı
1
and William Stafford Noble
2*
Abstract
High-throughput assays for measuring the
three-dimensional (3D) configuration of DNA have
provided unprecedented insights into the relationship
between DNA 3D configuration and function. Data
interpretation from assays such as ChIA-PET and Hi-C is
challenging because the data is large and cannot be
easily rendered using standard genome browsers. An
effective Hi-C visualization tool must provide several
visualization modes and be capable of viewing the
data in conjunction with existing, complementary data.
We review five software tools that do not require
programming expertise. We summarize their
complementary functionalities, and highlight which
tool is best equipped for specific tasks.
Introduction
The three-dimensional (3D) conformation of the genome
in the nucleus influences many key biological processes,
such as transcriptional regulation and DNA replication
timing. Over the past decade, chromosome conforma-
tion capture assays have been developed to characterize
3D contacts associated with a single locus (chromosome
conformation capture (3C), chromosome conformation
capture-on-chip (4C)) [1–3], a set of loci (chromosome
conformation capture carbon copy (5C), chromatin inter-
action analysis by paired-end tag sequencing (ChIA-PET))
[4, 5] or the whole genome (Hi-C) [6]. Using these assays,
researchers have profiled the conformation of chromatin
in a variety of organisms and systems, which has revealed
a hierarchical, domain-like organization of chromatin.
Here, we focus on the Hi-C assay and variants thereof,
which provide a genome-wide view of chromosome con-
formation. The assay consists of five steps: (1) crosslinking
DNA with formaldehyde, (2) cleaving cross-linked DNA
with an endonuclease, (3) ligating the ends of cross-linked
*Correspondence: william-noble@uw.edu
2
Department of Genome Sciences, Department of Computer Science and
Engineering, University of Washington, 3720 15th Ave NE, WA 98105, Seattle,
USA
Full list of author information is available at the end of the article
fragments to form a circular molecule marked with biotin,
(4) shearing circular DNA and pulling down fragments
marked with biotin, and (5) paired-end sequencing of the
pulled-down fragments. A pair of sequence reads from
a single ligated molecule map to two distinct regions of
the genome, and the abundance of such fragments pro-
vides a measure of how frequently, within a population
of cells, the two loci are in contact. Thus, by contrast
with assays such as DNase-seq and chromatin immuno-
precipitation sequencing (ChIP-seq) [7, 8], which yield a
one-dimensional count vector across the genome, the out-
put of Hi-C is a two-dimensional matrix of counts, with
one entry for each pair of genomic loci. Production of
this matrix involves a series of filtering and normalization
steps (reviewed in [9] and [10]).
A critical parameter in Hi-C analysis pipelines is the
effective resolution at which the data is analyzed [10, 11].
In this context, “resolution” simply refers to the size of
the loci for which Hi-C counts are aggregated. At present,
deep sequencing to achieve very high resolution data for
large genomes is prohibitively expensive. A basepair res-
olution analysis of the human genome would require the
aggregation of counts across a matrix of size approxi-
mately (3 × 10
9
)
2
= 9 × 10
18
. Reads that fall within a
contiguous genomic window are binned together, which
reduces the size and sparsity of the matrix at the cost of
resolution. Following this process, Hi-C data can be rep-
resented as a “contact matrix” M,whereentryM
ij
is the
number of Hi-C read pairs, or contacts, between genomic
locations designated by bin i and bin j.
Hi-C data presents substantial analytical challenges for
researchers who study chromatin conformation. Filtering
and normalization strategies can be employed to correct
experimental artifacts and biases [9–11]. Statistical confi-
dence measures can be estimated to identify sets of high
confidence contacts [12]. Hi-C data can be compared
with and correlated against complementary data sets mea-
suring protein–DNA interactions, gene expression, and
replication timing [13–15]. And 3D conformation of the
DNA itself can be estimated from Hi-C data, with the
potential to consider data derived from other assays or
from multiple experimental conditions [16–19].
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.