• Sub-arrays - A data or tag array is divided into a number of sub-arrays to reduce the delay due
to wordline and bitline. Unlike banks, at any given time, these sub-arrays support only one single
access. The total number of sub-arrays in a cache is equal to the product of Ndwl and Ndbl.
• Mat - A group of four sub-arrays (2x2) that share a common central predecoder. CACTI’s exhaus-
tive search starts from a minimum of at least one mat.
• Sub-bank - In a typical cache, a cache block is scattered across multiple sub-arrays to improve
the reliability of a cache. Irrespective of the cache organization, CACTI assumes that every cache
block in a cache is distributed across an entire row of mats and the row number corresponding to
a particular block is determined based on the block address. Each row (of mats) in an array is
referred to as a sub-bank.
• Ntwl/Ndwl - Number of horizontal partitions in a tag or data array i.e., the number of segments
that a single wordline is partitioned into.
• Ntbl/Ndbl - Number of vertical partitions in a tag or data array i.e., the number of segments that a
single bitline is partitioned into.
• Ntspd/Nspd - Number of sets stored in each row of a sub-array. For a given Ndwl and Ndbl values,
Nspd decides the aspect ratio of the sub-array.
• Ntcm/Ndcm - Degree of bitline multiplexing.
• Ntsam/Ndsam - Degree of sense-amplifier multiplexing.
3 New features in CACTI 6.0
CACTI 6.0 comes with a number of new features, most of which are targeted to improve the tool’s
ability to model large caches.
• Incorporation of many different wire models for the inter-bank network: local/intermediate/global
wires, repeater sizing/spacing for optimal delay or power, low-swing differential wires.
• Incorporation of models for router components (buffers, crossbar, arbiter).
• Introduction of grid topologies for NUCA and a shared bus architecture for UCA with low-swing
wires.
• An algorithm for design space exploration that models different grid layouts and estimates average
bank and network latency. The design space exploration also considers different wire and router
types.
• The introduction of empirical network contention models to estimate the impact of network con-
figuration, bank cycle time, and workload on average cache access delay.
• An improved and more accurate wordline and bitline delay model.
• A validation analysis of all new circuit models: low-swing differential wires, distributed RC model
for wordlines and bitlines within cache banks (router components have been validated elsewhere).
• An improved interface that enables trade-off analysis for latency, power, cycle time, and area.
4