NOC架构性能评估与设计权衡：多核系统的关键技术与方法

需积分: 10 28 浏览量更新于2024-08-02 收藏 2.14MB PDF 举报

本文主要探讨了多核系统级芯片（Multiprocessor System-on-Chip, MP-SoC）平台中的网络-on-chip (Network-on-Chip, NOC) 技术在性能评估和设计折衷策略中的重要性。随着SoC设计对功耗和布线约束的日益关注，为了实现模块化和明确的并行性，NOC架构作为一种可扩展的通信中心型互连方案，正在成为MP-SoC设计的关键组成部分。NOCs因其独特的特性，如低延迟、高吞吐量和能源效率，对满足这些新型系统的需求非常有吸引力。研究者们在本文中提出了一种全面且具有意义的性能评估方法，用于比较不同NOC架构的性能指标，包括但不限于延迟、带宽、能耗和硅片面积等关键因素。通过对这些技术的深入剖析，作者揭示了在设计过程中可能遇到的一系列设计折衷，例如： 1. **延迟与吞吐量**：NOC架构的不同拓扑（如环形、网格、树状或星型）会影响数据传输的速度和路径长度，从而影响实时性和整体性能。选择哪种拓扑取决于应用需求，如对延迟敏感的实时任务可能更倾向于低延迟的设计，而高吞吐量应用可能更看重网络的连接度。 2. **能源效率**：NOC设计者必须权衡在提高性能的同时保持能耗的平衡。例如，全互连设计虽然提供较低的延迟，但能耗较高；而部分互连设计则可能导致更高的延迟，但在能耗控制上更为有效。 3. **硅片面积**：NOC的复杂性与面积成本密切相关。增加互连资源（如更多的路由器和线路）可以提高性能，但也意味着更大的芯片尺寸，这可能对制造成本和散热产生影响。 4. **可扩展性和灵活性**：设计者需考虑NOC是否能适应不断增长的处理器核心数，以及能否方便地进行硬件或软件调整来应对不同的工作负载变化。 5. **容错性和可靠性**：大规模互联架构可能对错误处理和冗余机制提出更高要求，这在设计时也需要纳入考虑。本论文为理解和优化NOC架构的设计决策提供了有价值的研究框架，有助于工程师在性能、成本和复杂性之间找到最佳的平衡点，以适应MP-SoC平台的快速演进。

architecture exhibits high throughput, low latency, energy

efficiency, and low area overhead. In today’s power

constrained environments, it is increasingly critical to be

able to identify the most energy efficient architectures and

to be able to quantify the energy-performance trade-offs [3].

Generally, the additional area overhead due to the infra-

structure IPs should be reasonably small. We now describe

these metrics in more detail.

4.1 Message Throughput

Typically, the performance of a digital communication

network is characterized by its bandwidth in bits/sec.

However, we are more concerned here with the rate that

message traffic can be sent across the network and, so,

throughput is a more appropriate metric. Throughput can be

defined in a variety of different ways depending on the

specifics of the implementation. For message passing

systems, we can define message throughput, TP, as follows:

TP ¼

ðT otal messages completedÞðMessage lengthÞ

ðNumber of IP blocksÞðTotal timeÞ

;

ð1Þ

where Total messages completed refers to the number of whole

messages that successfully arrive at their destination IPs,

Message length is measured in flits, Number of IP blocks is the

number of functional IP blocks involved in the commu-

nication, and Total time is the time (in clock cycles) that

elapses between the occurrence of the first message

generation and the last message reception. Thus, message

throughput is measured as the fraction of the maximum

load that the network is capable of physically handling. An

overall throughput of TP ¼ 1 corresponds to all end nodes

receiving one flit every cycle. Accordingly, throughput is

measured in flits/cycle/IP. Throughput signifies the max-

imum value of the accepted traffic and it is related to the

peak data rate sustainable by the system.

4.2 Transport Latency

Transport latency is defined as the time (in clock cycles) that

elapses from between the occurrence of a message header

injection into the network at the source node and the

occurrence of a tail flit reception at the destination node

[21]. We refer to this simply as latency in the remainder of

this paper. In order to reach the destination node from some

starting source node, flits must travel through a path

consisting of a set of switches and interconnect, called

stages. Depending on the source/destination pair and the

routing algorithm, each message may have a different

latency. There is also some overhead in the source and

destination that also contributes to the overall latency.

Therefore, for a given message i, the latency L

is:

¼ sender overhead þ transport latency

þ receiver overhead:

We use the average latency as a performance metric in

our evaluation methodology. Let P be the total number of

messages reaching their destination IPs and let L

be the

latency of each message i, where i ranges from 1 to P . The

average latency, L

avg

, is then calculated according to the

following:

avg

: ð2Þ

4.3 Energy

When flits travel on the interconnection network, both the

interswitch wires and the logic gates in the switches toggle

and this will result in energy dissipation. Here, we are

concerned with the dynamic energy dissipation caused by

the communication process in the network. The flits from

the source nodes need to traverse multiple hops consisting

of switches and wires to reach destinations. Consequently,

we determine the energy dissipated by the flits in each

interconnect and switch hop. The energy per flit per hop is

given by

hop

¼ E

switch

þ E

interconnect

; ð3Þ

where E

switch

and E

interconnect

depend on the total capaci-

tances and signal activity of the switch and each section of

interconnect wire, respectively. They are determined as

follows:

switch

¼ 

switch

; ð4Þ

interconnect

¼ 

interconnect

: ð5Þ



switch

;

interconnect

and C

switch

interconnect

are t he sign al

activities and the total capacitances of the switches and

wire segments, respectively. The energy dissipated in

transporting a packet consisting of n flits over h hops can

be calculated as

packet

¼ n

j¼1

hop;j

: ð6Þ

Let P be the total number of packets transported, and let

packet

be the energy dissipated by the ith packet, where i

ranges from 1 to P . The average energy per packet,

packet

is then calculated according to the following equation:

packet

i¼1

packet

i¼1

j¼1

hop;j



: ð7Þ

The parameters 

switch

and 

interconnect

are those that capture

the fact that the signal activities in the switches and the

interconnect segments will be data-dependent, e.g., there

may be long sequences of 1s or 0s that will not cause any

transitions. Any of the different low-power coding techni-

ques [29] aimed at minimizing the number of transitions can

be applied to any of the topologies described here. For the

sake of simplicity and without loss of generality, we do not

consider any specialized coding techniques in our analysis.

4.4 Area Requirements

To evaluate the feasibility of these interconnect schemes, we

consider their respective silicon area requirements. As the

switches form an integral part of the active components, the

1028 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 8, AUGUST 2005

Fig. 2. Virtual-channel switch.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 11, 2009 at 01:51 from IEEE Xplore. Restrictions apply.

剩余15页未读，继续阅读

victor_ding

粉丝: 0

NOC架构性能评估与设计权衡：多核系统的关键技术与方法

Using Confidence Bounds for Exploitation-Exploration Trade-offs

On-Chip Communication Architectures.pdf

Design Trade-offs for a Robust Dynamic Hybrid Hash Join (Extende

spectrum sensing\Spectrum sensing in cognitive radio networks_ requirements, challenges and design trade-offs [cognitive radio communications]

FPGA Hardware Accelerators - Case Study on Design Methodologies and Trade-Offs

Trade-offs in Analog circuit design

Trade-offs Between CMOS, JFET, and Bipolar Input Stage Technology

?? ?????2.zip_Trade-Offs_drone summary

????? ??????.zip_Trade-Offs_drone research

Good Sign-Offs Gmail Extension-crx插件

最新资源