没有合适的资源?快使用搜索试试~ 我知道了~
首页现代体系结构的优化编译器(带封面+英文版+文字可复制)
现代体系结构的优化编译器(带封面+英文版+文字可复制)
需积分: 16 23 下载量 191 浏览量
更新于2023-04-28
1
收藏 2.22MB PDF 举报
Optimizing compilers for modern architectures: a dependence based approach
资源详情
资源推荐
![](https://csdnimg.cn/release/download_crawler_static/47645490/bg1.jpg)
![](https://csdnimg.cn/release/download_crawler_static/47645490/bg2.jpg)
Chapter Draft of February 8, 2001
3
CHAPTER 1
Compiler Challenges for High-Performance Architectures
17
1.1
Overview and Goals
17
1.2
Pipelining
21
1.2.1 Pipelined Instruction Units 21
1.2.2 Pipelined Execution Units 23
1.2.3 Parallel Functional Units 24
1.2.4 Compiling for Scalar Pipelines. 25
1.3
Vector Instructions
29
1.3.1 Vector Hardware Overview 29
1.3.2 Compiling for Vector Pipelines 30
1.4
Superscalar and VLIW Processors
32
1.4.1 Multiple-Issue Instruction Units 32
1.4.2 Compiling for Multiple-Issue Processors 33
1.5
Processor Parallelism
35
1.5.1 Compiling for Asynchronous Parallelism 37
1.6
Memory Hierarchy
39
1.6.1 Compiling for Memory Hierarchy 41
1.7
A Case Study: Matrix Multiplication
42
1.8
Advanced Compiler Technology
47
1.8.1 Dependence 48
1.8.2 Transformations 50
1.9
Chapter Summary
51
1.10
Case Studies
51
1.11
Historical Comments and References
52
1.12
Exercises
53
1.13
References
54
CHAPTER 2
Dependence: Theory and Practice
57
2.1
Introduction
57
2.2
Dependence and its Properties
58
2.2.1 Load-Store Classification 60
2.2.2 Dependence in Loops 61
2.2.3 Dependence and Transformations 63
2.2.4 Distance and Direction Vectors 68
2.2.5 Loop-carried and Loop-independent Dependences 72
2.2.5.1 Loop-Carried Dependence 72
2.2.5.2 Loop-Independent Dependences 76
2.2.5.3 Iteration Reordering 78
![](https://csdnimg.cn/release/download_crawler_static/47645490/bg3.jpg)
4
ADVANCED COMPILING FOR HIGH PERFORMANCE
2.3
Simple Dependence Testing
79
2.4
Parallelization and Vectorization
82
2.4.1 Parallelization 82
2.4.2 Vectorization 83
2.4.3 An Advanced Vectorization Algorithm 86
2.5
Chapter Summary
93
2.6
Case Studies
93
2.7
Historical Comments and References
94
2.8
Exercises
95
2.9
References
96
CHAPTER 3
Dependence Testing
99
3.1
Introduction
99
3.1.1 Background and Terminology 101
3.1.1.1 Indexes and Subscripts 101
3.1.1.2 Nonlinearity 101
3.1.1.3 Conservative Testing 102
3.1.1.4 Complexity 103
3.1.1.5 Separability 103
3.1.1.6 Coupled Subscript Groups 104
3.2
Dependence Testing Overview
106
3.2.1 Subscript Partitioning 106
3.2.2 Merging Direction Vectors 107
3.3
Single-Subscript Dependence Tests
108
3.3.1 ZIV Test 108
3.3.2 SIV Tests 108
3.3.2.1 Strong SIV Subscripts 109
3.3.2.2 Weak SIV Subscripts 110
3.3.2.3 Weak-zero SIV Subscripts 111
3.3.2.4 Weak-crossing SIV Subscripts 112
3.3.2.5 Complex Iteration Spaces 114
3.3.2.6 Symbolic SIV Dependence Tests 117
3.3.2.7 Breaking Conditions 119
3.3.2.8 An Exact SIV Test 121
3.3.3 Multiple Induction Variable Tests 122
3.3.3.1 GCD Test 124
3.3.3.2 Banerjee Inequality 124
3.3.3.3 Handling Symbolics in the Banerjee Inequality 130
3.3.3.4 Trapezoidal Banerjee Inequality 131
3.3.3.5 Testing for All Direction Vectors 139
3.4
Testing in Coupled Groups
140
![](https://csdnimg.cn/release/download_crawler_static/47645490/bg4.jpg)
Chapter Draft of February 8, 2001
5
3.4.1 The Delta Test 141
3.4.1.1 Constraints 143
3.4.1.2 Intersecting Constraints 144
3.4.1.3 Constraint Propagation 145
3.4.1.4 Precision and Complexity 149
3.4.2 More Powerful Multiple-Subscript Tests 150
3.5
An Empirical Study
151
3.6
Putting It All Together
154
3.7
Chapter Summary
160
3.8
Case Studies
162
3.9
Historical Comments and References
163
3.10
Exercises
164
3.11
References
165
CHAPTER 4
Preliminary Transformations
169
4.1
Introduction
169
4.2
Information Requirements
172
4.3
Loop Normalization
173
4.4
Data Flow Analysis
176
4.4.1 Definition-Use Chains 176
4.4.2 Dead Code Elimination 180
4.4.3 Constant Propagation 181
4.4.4 Static Single-Assignment Form 184
4.5
Induction-Variable Exposure
192
4.5.1 Forward Expression Substitution 192
4.5.2 Induction-Variable Substitution 196
4.5.3 Driving the Substitution Process 200
4.6
Chapter Summary
204
4.7
Case Studies
204
4.8
Historical Comments and References
206
4.9
Exercises
207
4.10
References
208
CHAPTER 5
Enhancing Fine-Grained Parallelism
211
5.1
Overview
211
5.2
Loop Interchange
213
![](https://csdnimg.cn/release/download_crawler_static/47645490/bg5.jpg)
6
ADVANCED COMPILING FOR HIGH PERFORMANCE
5.2.1 Safety of Loop Interchange 214
5.2.2 Profitability of Loop Interchange 218
5.2.3 Loop Interchange and Vectorization 219
5.2.3.1 A Code Generation Framework 223
5.2.3.2 General Loop Selection and Interchange 223
5.3
Scalar Expansion
226
5.4
Scalar and Array Renaming
236
5.5
Node Splitting
244
5.6
Recognition of Reductions
247
5.7
Index-set Splitting
251
5.7.1 Threshold Analysis 251
5.7.2 Loop Peeling 253
5.7.3 Section-based Splitting 254
5.8
Run-time Symbolic Resolution
256
5.9
Loop Skewing
258
5.10
Putting It All Together
263
5.11
Complications of Real Machines
270
5.12
Chapter Summary
274
5.13
Case Studies
275
5.14
Historical Comments and References
280
5.15
Exercises
281
5.16
References
282
CHAPTER 6
Creating Coarse-Grained Parallelism
285
6.1
Introduction
285
6.2
Single-Loop Methods
287
6.2.1 Privatization 287
6.2.2 Loop Distribution 292
6.2.3 Alignment 293
6.2.4 Code Replication 297
6.2.5 Loop Fusion 302
6.2.5.1 Typed Fusion 308
6.2.5.2 Unordered and Ordered Typed Fusion 316
6.2.5.3 Cohort Fusion 318
6.3
Perfect Loop Nests
320
6.3.1 Loop Interchange 320
6.3.2 Loop Selection 325
剩余832页未读,继续阅读
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![application/x-rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![-](https://csdnimg.cn/download_wenku/file_type_lunwen.png)
![-](https://csdnimg.cn/download_wenku/file_type_lunwen.png)
![-](https://csdnimg.cn/download_wenku/file_type_lunwen.png)
![-](https://csdnimg.cn/download_wenku/file_type_lunwen.png)
![-](https://csdnimg.cn/download_wenku/file_type_lunwen.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://profile-avatar.csdnimg.cn/85d6aca567724115b127d5df44c47d2d_weixin_39995275.jpg!1)
watesoyan
- 粉丝: 0
- 资源: 6
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![](https://csdnimg.cn/release/wenkucmsfe/public/img/voice.245cc511.png)
会员权益专享
最新资源
- 谷歌文件系统下的实用网络编码技术在分布式存储中的应用
- 跨国媒体对南亚农村社会的影响:以斯里兰卡案例的社会学分析
- RFM2g接口驱动操作手册:API与命令行指南
- 基于裸手的大数据自然人机交互关键算法研究
- ABAQUS下无人机机翼有限元分析与局部设计研究
- TCL基础教程:语法、变量与操作详解
- FPGA与数字前端面试题集锦:流程、设计与Verilog应用
- 2022全球互联网技术人才前瞻:元宇宙驱动下的创新与挑战
- 碳排放权交易实战手册(第二版):设计与实施指南
- 2022新经济新职业洞察:科技驱动下的百景变革
- 红外与可见光人脸融合识别技术探究
- NXP88W8977:2.4/5 GHz 双频 Wi-Fi4 + Bluetooth 5.2 合体芯片
- NXP88W8987:集成2.4/5GHz Wi-Fi 5与蓝牙5.2的单芯片解决方案
- TPA3116D2DADR: 单声道数字放大器驱动高达50W功率
- TPA3255-Q1:315W车载A/D类音频放大器,高保真、宽频设计
- 42V 输入 5A 降压稳压器 TPS54540B-Q1 的特点和应用
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035111.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![](https://csdnimg.cn/release/wenkucmsfe/public/img/green-success.6a4acb44.png)