
architecture exhibits high throughput, low latency, energy
efficiency, and low area overhead. In today’s power
constrained environments, it is increasingly critical to be
able to identify the most energy efficient architectures and
to be able to quantify the energy-performance trade-offs [3].
Generally, the additional area overhead due to the infra-
structure IPs should be reasonably small. We now describe
these metrics in more detail.
4.1 Message Throughput
Typically, the performance of a digital communication
network is characterized by its bandwidth in bits/sec.
However, we are more concerned here with the rate that
message traffic can be sent across the network and, so,
throughput is a more appropriate metric. Throughput can be
defined in a variety of different ways depending on the
specifics of the implementation. For message passing
systems, we can define message throughput, TP, as follows:
TP ¼
ðT otal messages completedÞðMessage lengthÞ
ðNumber of IP blocksÞðTotal timeÞ
;
ð1Þ
where Total messages completed refers to the number of whole
messages that successfully arrive at their destination IPs,
Message length is measured in flits, Number of IP blocks is the
number of functional IP blocks involved in the commu-
nication, and Total time is the time (in clock cycles) that
elapses between the occurrence of the first message
generation and the last message reception. Thus, message
throughput is measured as the fraction of the maximum
load that the network is capable of physically handling. An
overall throughput of TP ¼ 1 corresponds to all end nodes
receiving one flit every cycle. Accordingly, throughput is
measured in flits/cycle/IP. Throughput signifies the max-
imum value of the accepted traffic and it is related to the
peak data rate sustainable by the system.
4.2 Transport Latency
Transport latency is defined as the time (in clock cycles) that
elapses from between the occurrence of a message header
injection into the network at the source node and the
occurrence of a tail flit reception at the destination node
[21]. We refer to this simply as latency in the remainder of
this paper. In order to reach the destination node from some
starting source node, flits must travel through a path
consisting of a set of switches and interconnect, called
stages. Depending on the source/destination pair and the
routing algorithm, each message may have a different
latency. There is also some overhead in the source and
destination that also contributes to the overall latency.
Therefore, for a given message i, the latency L
i
is:
L
i
¼ sender overhead þ transport latency
þ receiver overhead:
We use the average latency as a performance metric in
our evaluation methodology. Let P be the total number of
messages reaching their destination IPs and let L
i
be the
latency of each message i, where i ranges from 1 to P . The
average latency, L
avg
, is then calculated according to the
following:
L
avg
¼
P
P
l
L
i
P
: ð2Þ
4.3 Energy
When flits travel on the interconnection network, both the
interswitch wires and the logic gates in the switches toggle
and this will result in energy dissipation. Here, we are
concerned with the dynamic energy dissipation caused by
the communication process in the network. The flits from
the source nodes need to traverse multiple hops consisting
of switches and wires to reach destinations. Consequently,
we determine the energy dissipated by the flits in each
interconnect and switch hop. The energy per flit per hop is
given by
E
hop
¼ E
switch
þ E
interconnect
; ð3Þ
where E
switch
and E
interconnect
depend on the total capaci-
tances and signal activity of the switch and each section of
interconnect wire, respectively. They are determined as
follows:
E
switch
¼
switch
C
switch
V
2
; ð4Þ
E
interconnect
¼
interconnect
C
interconnect
V
2
: ð5Þ
switch
;
interconnect
and C
switch
;C
interconnect
are t he sign al
activities and the total capacitances of the switches and
wire segments, respectively. The energy dissipated in
transporting a packet consisting of n flits over h hops can
be calculated as
E
packet
¼ n
X
h
j¼1
E
hop;j
: ð6Þ
Let P be the total number of packets transported, and let
E
packet
be the energy dissipated by the ith packet, where i
ranges from 1 to P . The average energy per packet,
E
packet
,
is then calculated according to the following equation:
E
packet
¼
P
P
i¼1
E
packet
i
P
¼
P
P
i¼1
n
i
P
h
i
j¼1
E
hop;j
P
: ð7Þ
The parameters
switch
and
interconnect
are those that capture
the fact that the signal activities in the switches and the
interconnect segments will be data-dependent, e.g., there
may be long sequences of 1s or 0s that will not cause any
transitions. Any of the different low-power coding techni-
ques [29] aimed at minimizing the number of transitions can
be applied to any of the topologies described here. For the
sake of simplicity and without loss of generality, we do not
consider any specialized coding techniques in our analysis.
4.4 Area Requirements
To evaluate the feasibility of these interconnect schemes, we
consider their respective silicon area requirements. As the
switches form an integral part of the active components, the
1028 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 8, AUGUST 2005
Fig. 2. Virtual-channel switch.
Authorized licensed use limited to: Zhejiang University. Downloaded on June 11, 2009 at 01:51 from IEEE Xplore. Restrictions apply.