CACTI-3DD: Architecture-level Modeling
for 3D Die-stacked DRAM Main Memory
Ke Chen
‡†
, Sheng Li
†
, Naveen Muralimanohar
†
, Jung Ho Ahn
§
, Jay B. Brockman
‡
, Norman P. Jouppi
†
‡
University of Notre Dame,
†
Hewlett-Packard Labs,
§
Seoul National University
‡
{kchen2, jbb }@nd.edu,
†
{kec, sheng.li4, naveen.muralimanohar, norm.jouppi}@hp.com,
§
gajh@snu.ac.kr
Abstract—Emerging 3D die-stacked DRAM technology is one of the
most promising solutions for future memory architectures to satisfy the
ever-increasing demands on performance, power, and cost. This paper
introduces CACTI-3DD, the first architecture-level integrated power,
area, and timing modeling framework for 3D die-stacked off-chip DRAM
main memory. CACTI-3DD includes TSV models, improves models for
2D off-chip DRAM main memory over current versions of CACTI, and
includes 3D integration models that enable the analysis of a full spectrum
of 3D DRAM designs from coarse-grained rank-level 3D stacking to bank-
level 3D stacking. CACTI-3DD enables an in-depth study of architecture-
level tradeoffs of power, area, and timing for 3D die-stacked DRAM
designs. We demonstrate the utility of CACTI-3DD in analyzing design
trade-offs of emerging 3D die-stacked DRAM main memories. We find
that a coarse-grained 3D DRAM design that stacks canonical DRAM dies
can only achieve marginal benefits in power, area, and timing compared
to the original 2D design. To fully leverage the huge internal bandwidth of
TSVs, DRAM dies must be re-architected, and system implications must
be considered when building 3D DRAMs with redesigned 2D planar
DRAM dies. Our results show that the 3D DRAM with re-architected
DRAM dies achieves significant improvements in power and timing
compared to the coarse-grained 3D die-stacked DRAM.
Keywords: 3D architecture, DRAM, TSV, Main memory, Modeling
I. INTRODUCTION
Modern computer systems demand ever-increasing performance,
power-efficiency, and capacity from Dynamic Random Access Mem-
ories (DRAMs) to meet system performance requirements. As
Moore’s Law drives CMOS technology into the deep nanoscale
regime, DRAM scaling faces serious challenges in speed, bandwidth,
capacity, and cost. For example, historically CPU performance has
improved at an annual rate of 55% while the memory access time
has only improved by 10%, resulting in the well-known memory wall
problem [6]. Moreover, for decades DRAM capacity had increased
4× every 3 years, but is now scaling much more slowly [6], resulting
in a memory capacity wall problem. Power and cost of DRAMs are
also facing similar challenges [6].
The DRAM industry has continued to innovate both technologies
and architectures in order to scale the performance, power, capacity
and cost of DRAMs as shown in Table I. New materials and
fabrication processes have been steadily introduced. Hierarchical
978-3-9810801-8-6/DATE12/
c
2012 EDAA
Ke Chen and Jay Brockman are partially supported by the C2S2 Focus Center,
one of six research centers funded under the Focus Center Research Program
(FCRP), a Semiconductor Research Corporation entity.
This material is based upon work supported by the Department of Energy
under Award Number DE - SC0005026. The disclaimer can be found at http:
//www.hpl.hp.com/DoE-Disclaimer.html
Jung Ho Ahn was supported by the Smart IT Convergence System Research
Center funded by the Ministry of Education, Science and Technology as
Global Frontier Project.
Innovations Tech Gens (nm)
Hierarchical wordline [7] 200
Interface from DDR to DDR2, DDR3 [5], [8] 100 & 65
Varying number of cells per bitline [18] 90
Cell size from 8F
2
, to 6F
2
and 4F
2
[14] 65 & 36
3D stacking [17] 50
Copper metallization [14] 44
High-k dielectric gate oxide [14] 31
TABLE I
DRAM TECHNO LOGY/ARCHITECTURE INNOVATIONS AND THE
TECHNOLOGY GENERATIONS WHEN THE INNOVATION WERE(WILL BE)
WIDELY USED. THE LAST TWO ROWS ARE FUTURE MILESTONES
ACCORDING TO ITRS [14]. THE SELECT ED REFERENCES ARE
REPRESENTATIVE DESIGNS BUT NOT NECESSARILY THE FIRST DESIGN.
wordlines and datalines were incorporated to maintain a steady trend
of increasing DRAM capacity [7]. Motivated to lower the cost and in-
crease the density of DRAM, the DRAM cell size is decreasing from
8F
2
to 6F
2
and then to 4F
2
[14]. The memory interface itself has seen
numerous advances from SDRAM to DDR through the upcoming
DDR4 standard. The wide adoption of multicore processors renders
memory bandwidth and capacity even more critical. Indicative of this
trend, conservative DRAM manufacturers are adopting more radical
technologies such as 3D die-stacked DRAM [10], [17].
Despite technology advances from the DRAM industry, the ability
to propose and evaluate new DRAM designs and their system
implications is currently limited by the availability and quality of
appropriate system-level tools. CACTI-D [15] built a solid founda-
tion for modeling DRAM technologies, including cells and DRAM
subarrays. However, since its peripheral circuit models including
the control path and data path were inherited from SRAM models,
CACTI-D’s overall DRAM model is more appropriate for embedded
DRAM than off-chip DRAM main memory. Correct modeling of the
peripheral circuits including hierarchical wordlines and datalines are
critical, since peripheral circuits play a critical role in determining
the overall power, area, and timing. For example, modern DRAM
designs usually achieve an area efficiency of approximately 50% [14].
Vogelsang [18] partially addressed this problem by developing a
power model with detailed peripheral circuit models for DRAM
main memories. However, this model is only for power and does not
estimate timing and area of a DRAM design, which is insufficient
since power, area, and timing are inseparable for modern DRAMs.
Especially important today is the ability to model emerging 3D
die-stacked DRAM technology [10], [17], which shows tremendous
promise for addressing performance, power, and capacity challenges
in the near future. Tsai et. al [16] extended an earlier CACTI
version to model 3D die-stacked SRAMs. However, 3D DRAM is
substantially different from 3D SRAM in the memory cell physics,
fabrication technology, circuit implementation, memory organization,
and peripheral circuit arrangements. Thus, a 3D SRAM model pro-
vides an inadequate basis for modeling 3D die-stacked DRAM [10],
[17].