超高效TOPS级DSP的PIPP模型：性能与功率综合分析

研究论文

126 浏览量更新于2024-08-26 收藏 295KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

Chinese Journal of Electronics

Vol.22, No.4 Oct. 2013

Parameterized Integrated Power and Performance

(PIPP) Model for Ultra High-Performance of

TOPS level DSP

∗

YANG Hui, CHEN Shuming and WU Tiebin

(School of Computer Science, National University of Defense Tec hnology, Changsha 410073, China)

Abstract — Amdahl’s law is a simple and fundamen-

tal tool for understanding the evolution of performance

as a function of parallelism. Following a recent trend on

timing and power analysis of general purpose many-core

chip using this law, we develop a nov el PIPP analytical

model for evaluating the performance and power of hier-

archical on-chip large-scale parallel architectures with the

core number, super-node size, processing element number,

and function unit number taken into consideration. We

thereby investigate the inﬂuence of workload characteris-

tics (Thread-level parallel TLP, Instruction-level parallel

ILP and Data-level parallel DLP) on resource allocation

with the restriction of p erformance and power. The re-

sults provide some feasible options to design TOPS level

DSP architecture as well as a theoretical basis for making

the design more scalable.

Key words — Hierarchical architecture, Data-level par-

allel (DLP), Thread-level parallel (TLP), Instruction-level

parallel (ILP), Model.

I. Introduction

DSPs are widely used in the embedded ﬁeld. In order to

meet requirement of software radio

[1]

, DSP performance has

to reach up to 10TIPS by the year of 2020

[2,3]

. Therefore it is

urgent to build TOPS-level DSP on a single chip.

Hierarchical architectures combined of Very long instruc-

tion word (VLIW), Single instruction multiple data (SIMD),

tightly-coupled super-node, and multi-core technique, which

can fully develop the parallelism of applications with lower

hardware cost, has been broadly utilized in current DSPs

[2]

But power scales at a higher pace than the performance.

One of main objectives of a system designer is to assess the

impact of certain architecture choices on the variable to be op-

timized, from the highest levels of the design ﬂow downwards.

There are two main strategies for current design methodolo-

gies: ﬁrstly, Instruction set simulations ISS and cycle-accurate

simulators

[4,5]

. However these methods are too detailed to

quickly explore the system-level design space. Secondly, ana-

lytical models. These are one approach to quickly identifying

advantageous architectures. But it is not detailed enough.

Hill and Marty introduced an analytical model for processor

performance and the number of cores in symmetric, asym-

metric, and dynamic multi-core chips

[6]

. Another approach

[7]

extended Hill and Marty’s model to include energy. Ge

[8,9]

proposed a power aware speedup model, which is intended to

provide a general form of parallel speedup model that supports

the emerging power aware architecture.

In contrast to all of these works, we present the Param-

eterized integrated power and performance (PIPP) analytical

model that jointly evaluates the tradeoﬀs between the core

number, super-node size, processing element number, func-

tion unit number, system performance, and power. We also

presented many ﬁrst-hand experimental results to support

and validate the proposed model, and then explore the in-

ﬂuence of workload characteristics (Thread-level parallel TLP,

Instruction-level parallel ILP and Data-level parallel DLP) on

resource allocation with the restriction of the performance and

power.

II. System Abstraction

Using Amdahl’s law as the basic analytical timing

model

[10]

, we try to predict the execution time. The abstract

parallel architecture is shown in Fig.1.

Fig. 1. Prototype hierarchical on-chip large-scale parallel ar-

chitectures

∗

Manuscript Received Mar. 2012; Accepted Jan. 2013. This work is supported by the National Natural Science Foundation of China

(No.61070036, No.61133007).

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38536576

粉丝: 6
资源: 939

超高效TOPS级DSP的PIPP模型：性能与功率综合分析

wcdma软切换性能matlab仿真m,对比平均激活集数(MASN)、激活集更新率(ASUR)及呼叫中断概率(OP)三个性能指标

新思科技全新嵌入式视觉处理器IP核为人工智能芯片提供业界领先的35 TOPS性能.pdf

tops和tflops区别

tflops和tops换算

1tops算力能作什么

TOPS和TFLOPS的区别

算力TOPS 和DMIPS的区别

TFLOPS和TOPS

tflops和tops

TOPS FLOPS DMIPS 区别

tms320c66x keystone架构多核dsp入门与实例精解.pdf

TOPS和FLOPS的区别？

SLC_mosaic_S1_TOPS

yolov5 2TOPS

trutops punch tops 300安装教程

2、请调研Xilinx公司最新自适应加速平台（ACAP），详细叙述此平台的结构、工艺、性能等。

显卡的tops指标是什么意思

tops doa csdn

模型推理帧率和延迟 Ascend310 AI处理器规格 Ascend310 AI处理器逻辑架

ModuleNotFoundError: No module named 'tops'

最新资源