the statistics, namely, the activity factors of each individual components to McPAT via the XML interface file. As
shown in Equation 1, the activity factor of a component is the product of access count of the component and the average
hamming distance of all accesses for the time interval. By changing the time interval, one can change the granularity
of runtime power computation. If the performance simulator calls McPAT for runtime power computation every cycle,
a cycle accurate power consumption profile will be generated, which will be useful to study realtime power spikes. If
the performance simulator calls McPAT for runtime power computation after power simulation is complete, an averaged
power profile will be generated.
ActivityFactor = (AccessCount ∗ ((
n
∑
i=1
HammingDistance)/n))/n (1)
In Equation 1, n is the cycle count for a given simulation period, AccessCount is the number of accesses to a specific
component during the period, and the HammingDistance is the total number of flipped bits for two consecutive accesses.
If the performance simulator cannot track the Hamming Distance, McPAT assumes that all bits are flipped per cycle.
It is always best for the performance simulator can provide the activity factor for each individual component, however,
McPAT also has the ability to reason about activity factors for components as long as the basic statistics information
is provided by the performance simulator. For example, if the performance simulator can only track the number of
memory instructions rather than the detailed information of activity factors of the load or store queue, McPAT assumes
that each memory instruction will involve one read, one write, and one or two search operations (depends on the hardware
specification) on load and store queues. This assumption is made based on the default implementations of the load and
store unit as discussed later in this report.
McPAT runs separately from a performance simulator and only reads performance statistics from it — therefore,
its impact on the simulation speed of the native performance simulator is minimal. Although the initialization phase
of McPAT may take some time to complete because of the huge search space, it will not affect the simulator speed
significantly since it needs to be done only once at the beginning of the simulation. During the computation phase, some
simulator overhead may result from added performance counters.
3 Integrated and Hierarchical Modeling Framework
In order to model the power, area, and timing of a multicore processor, McPAT takes an integrated and hierarchical
approach. It is integrated in that McPAT models power, area, and timing simultaneously. Because of this McPAT is able
to ensure that the results are mutually consistent from an electrical standpoint. It is hierarchical in that it decomposes
the models into three levels: architectural, circuit, and technology level. This provides users with the flexibility to model
a broad range of possible multicore configurations across multiple implementation technologies. Taken together, this
integrated and hierarchical approach enables the user to paint a comprehensive picture of a design space, exploring
tradeoffs between design and technology choices in terms of power, area, and timing.
3.1 Power Modeling
As shown in Equation (2), power dissipation of CMOS circuits has three main components: dynamic, short-circuit, and
leakage power. All three contribute significantly to the total power dissipation of multicore processors fabricated using
a deep-submicron technology.
7