没有合适的资源?快使用搜索试试~ 我知道了~
首页超高效TOPS级DSP的PIPP模型:性能与功率综合分析
超高效TOPS级DSP的PIPP模型:性能与功率综合分析
0 下载量 126 浏览量
更新于2024-08-26
收藏 295KB PDF 举报
本文档探讨了"超高性能TOPS级DSP的参数化集成功率和性能(PIPP)模型",发表在《中国电子》杂志2013年10月第22卷第4期。该研究论文主要关注的是随着通用多核芯片的时序和功率分析趋势,作者杨辉、陈明珠和吴铁斌针对高性能嵌入式系统中的大规模并行架构提出了一种创新的分析模型。PIPP模型考虑了核心数量、超级节点大小、处理元素数以及功能单元数等关键参数,旨在评估这些因素对系统性能和功耗的影响。 文章首先回顾了Amdahl定律,这是一种理解性能随并行性提升的基本工具。作者注意到,随着处理器设计向多核和大规模并行化发展,传统的Amdahl定律可能不再完全适用。因此,他们开发了一个全新的PIPP模型,其目的是更精确地预测在考虑工作负载特性(如线程级并行性TLP、指令级并行性ILP和数据级并行性DLP)的前提下,如何优化资源分配以实现高性能的同时控制功耗。 研究的核心内容包括通过理论分析和建模,量化不同并行度下性能提升与功耗增加的关系,以及如何通过调整工作负载分布来最大化系统效率。PIPP模型提供了设计TOPS级别(每秒万亿次运算)DSP架构的实用策略,并为设计师们在满足性能需求的同时,平衡硬件资源和能耗提供了理论依据。 此外,该模型还可能为优化芯片设计过程中的功耗管理和热管理提供指导,因为高效能计算通常伴随着更高的热量产生,而有效的能源管理和散热策略对于现代微处理器至关重要。这篇文章对于理解和改进高性能数字信号处理器的设计有着重要的实践价值和理论贡献。
资源详情
资源推荐
Chinese Journal of Electronics
Vol.22, No.4 Oct. 2013
Parameterized Integrated Power and Performance
(PIPP) Model for Ultra High-Performance of
TOPS level DSP
∗
YANG Hui, CHEN Shuming and WU Tiebin
(School of Computer Science, National University of Defense Tec hnology, Changsha 410073, China)
Abstract — Amdahl’s law is a simple and fundamen-
tal tool for understanding the evolution of performance
as a function of parallelism. Following a recent trend on
timing and power analysis of general purpose many-core
chip using this law, we develop a nov el PIPP analytical
model for evaluating the performance and power of hier-
archical on-chip large-scale parallel architectures with the
core number, super-node size, processing element number,
and function unit number taken into consideration. We
thereby investigate the influence of workload characteris-
tics (Thread-level parallel TLP, Instruction-level parallel
ILP and Data-level parallel DLP) on resource allocation
with the restriction of p erformance and power. The re-
sults provide some feasible options to design TOPS level
DSP architecture as well as a theoretical basis for making
the design more scalable.
Key words — Hierarchical architecture, Data-level par-
allel (DLP), Thread-level parallel (TLP), Instruction-level
parallel (ILP), Model.
I. Introduction
DSPs are widely used in the embedded field. In order to
meet requirement of software radio
[1]
, DSP performance has
to reach up to 10TIPS by the year of 2020
[2,3]
. Therefore it is
urgent to build TOPS-level DSP on a single chip.
Hierarchical architectures combined of Very long instruc-
tion word (VLIW), Single instruction multiple data (SIMD),
tightly-coupled super-node, and multi-core technique, which
can fully develop the parallelism of applications with lower
hardware cost, has been broadly utilized in current DSPs
[2]
.
But power scales at a higher pace than the performance.
One of main objectives of a system designer is to assess the
impact of certain architecture choices on the variable to be op-
timized, from the highest levels of the design flow downwards.
There are two main strategies for current design methodolo-
gies: firstly, Instruction set simulations ISS and cycle-accurate
simulators
[4,5]
. However these methods are too detailed to
quickly explore the system-level design space. Secondly, ana-
lytical models. These are one approach to quickly identifying
advantageous architectures. But it is not detailed enough.
Hill and Marty introduced an analytical model for processor
performance and the number of cores in symmetric, asym-
metric, and dynamic multi-core chips
[6]
. Another approach
[7]
extended Hill and Marty’s model to include energy. Ge
[8,9]
proposed a power aware speedup model, which is intended to
provide a general form of parallel speedup model that supports
the emerging power aware architecture.
In contrast to all of these works, we present the Param-
eterized integrated power and performance (PIPP) analytical
model that jointly evaluates the tradeoffs between the core
number, super-node size, processing element number, func-
tion unit number, system performance, and power. We also
presented many first-hand experimental results to support
and validate the proposed model, and then explore the in-
fluence of workload characteristics (Thread-level parallel TLP,
Instruction-level parallel ILP and Data-level parallel DLP) on
resource allocation with the restriction of the performance and
power.
II. System Abstraction
Using Amdahl’s law as the basic analytical timing
model
[10]
, we try to predict the execution time. The abstract
parallel architecture is shown in Fig.1.
Fig. 1. Prototype hierarchical on-chip large-scale parallel ar-
chitectures
∗
Manuscript Received Mar. 2012; Accepted Jan. 2013. This work is supported by the National Natural Science Foundation of China
(No.61070036, No.61133007).
下载后可阅读完整内容,剩余4页未读,立即下载
weixin_38536576
- 粉丝: 6
- 资源: 939
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 十种常见电感线圈电感量计算公式详解
- 军用车辆:CAN总线的集成与优势
- CAN总线在汽车智能换档系统中的作用与实现
- CAN总线数据超载问题及解决策略
- 汽车车身系统CAN总线设计与应用
- SAP企业需求深度剖析:财务会计与供应链的关键流程与改进策略
- CAN总线在发动机电控系统中的通信设计实践
- Spring与iBATIS整合:快速开发与比较分析
- CAN总线驱动的整车管理系统硬件设计详解
- CAN总线通讯智能节点设计与实现
- DSP实现电动汽车CAN总线通讯技术
- CAN协议网关设计:自动位速率检测与互连
- Xcode免证书调试iPad程序开发指南
- 分布式数据库查询优化算法探讨
- Win7安装VC++6.0完全指南:解决兼容性与Office冲突
- MFC实现学生信息管理系统:登录与数据库操作
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功