没有合适的资源?快使用搜索试试~ 我知道了~
首页Optimizing Compilers for Modern Architectures
Optimizing Compilers for Modern Architectures

Optimizing Compilers for Modern Architectures.pdf
资源详情
资源评论
资源推荐

Chapter Draft of February 8, 2001
3
CHAPTER 1
Compiler Challenges for High-Performance Architectures
17
1.1
Overview and Goals
17
1.2
Pipelining
21
1.2.1 Pipelined Instruction Units 21
1.2.2 Pipelined Execution Units 23
1.2.3 Parallel Functional Units 24
1.2.4 Compiling for Scalar Pipelines. 25
1.3
Vector Instructions
29
1.3.1 Vector Hardware Overview 29
1.3.2 Compiling for Vector Pipelines 30
1.4
Superscalar and VLIW Processors
32
1.4.1 Multiple-Issue Instruction Units 32
1.4.2 Compiling for Multiple-Issue Processors 33
1.5
Processor Parallelism
35
1.5.1 Compiling for Asynchronous Parallelism 37
1.6
Memory Hierarchy
39
1.6.1 Compiling for Memory Hierarchy 41
1.7
A Case Study: Matrix Multiplication
42
1.8
Advanced Compiler Technology
47
1.8.1 Dependence 48
1.8.2 Transformations 50
1.9
Chapter Summary
51
1.10
Case Studies
51
1.11
Historical Comments and References
52
1.12
Exercises
53
1.13
References
54
CHAPTER 2
Dependence: Theory and Practice
57
2.1
Introduction
57
2.2
Dependence and its Properties
58
2.2.1 Load-Store Classification 60
2.2.2 Dependence in Loops 61
2.2.3 Dependence and Transformations 63
2.2.4 Distance and Direction Vectors 68
2.2.5 Loop-carried and Loop-independent Dependences 72
2.2.5.1 Loop-Carried Dependence 72
2.2.5.2 Loop-Independent Dependences 76
2.2.5.3 Iteration Reordering 78

4
ADVANCED COMPILING FOR HIGH PERFORMANCE
2.3
Simple Dependence Testing
79
2.4
Parallelization and Vectorization
82
2.4.1 Parallelization 82
2.4.2 Vectorization 83
2.4.3 An Advanced Vectorization Algorithm 86
2.5
Chapter Summary
93
2.6
Case Studies
93
2.7
Historical Comments and References
94
2.8
Exercises
95
2.9
References
96
CHAPTER 3
Dependence Testing
99
3.1
Introduction
99
3.1.1 Background and Terminology 101
3.1.1.1 Indexes and Subscripts 101
3.1.1.2 Nonlinearity 101
3.1.1.3 Conservative Testing 102
3.1.1.4 Complexity 103
3.1.1.5 Separability 103
3.1.1.6 Coupled Subscript Groups 104
3.2
Dependence Testing Overview
106
3.2.1 Subscript Partitioning 106
3.2.2 Merging Direction Vectors 107
3.3
Single-Subscript Dependence Tests
108
3.3.1 ZIV Test 108
3.3.2 SIV Tests 108
3.3.2.1 Strong SIV Subscripts 109
3.3.2.2 Weak SIV Subscripts 110
3.3.2.3 Weak-zero SIV Subscripts 111
3.3.2.4 Weak-crossing SIV Subscripts 112
3.3.2.5 Complex Iteration Spaces 114
3.3.2.6 Symbolic SIV Dependence Tests 117
3.3.2.7 Breaking Conditions 119
3.3.2.8 An Exact SIV Test 121
3.3.3 Multiple Induction Variable Tests 122
3.3.3.1 GCD Test 124
3.3.3.2 Banerjee Inequality 124
3.3.3.3 Handling Symbolics in the Banerjee Inequality 130
3.3.3.4 Trapezoidal Banerjee Inequality 131
3.3.3.5 Testing for All Direction Vectors 139
3.4
Testing in Coupled Groups
140

Chapter Draft of February 8, 2001
5
3.4.1 The Delta Test 141
3.4.1.1 Constraints 143
3.4.1.2 Intersecting Constraints 144
3.4.1.3 Constraint Propagation 145
3.4.1.4 Precision and Complexity 149
3.4.2 More Powerful Multiple-Subscript Tests 150
3.5
An Empirical Study
151
3.6
Putting It All Together
154
3.7
Chapter Summary
160
3.8
Case Studies
162
3.9
Historical Comments and References
163
3.10
Exercises
164
3.11
References
165
CHAPTER 4
Preliminary Transformations
169
4.1
Introduction
169
4.2
Information Requirements
172
4.3
Loop Normalization
173
4.4
Data Flow Analysis
176
4.4.1 Definition-Use Chains 176
4.4.2 Dead Code Elimination 180
4.4.3 Constant Propagation 181
4.4.4 Static Single-Assignment Form 184
4.5
Induction-Variable Exposure
192
4.5.1 Forward Expression Substitution 192
4.5.2 Induction-Variable Substitution 196
4.5.3 Driving the Substitution Process 200
4.6
Chapter Summary
204
4.7
Case Studies
204
4.8
Historical Comments and References
206
4.9
Exercises
207
4.10
References
208
CHAPTER 5
Enhancing Fine-Grained Parallelism
211
5.1
Overview
211
5.2
Loop Interchange
213

6
ADVANCED COMPILING FOR HIGH PERFORMANCE
5.2.1 Safety of Loop Interchange 214
5.2.2 Profitability of Loop Interchange 218
5.2.3 Loop Interchange and Vectorization 219
5.2.3.1 A Code Generation Framework 223
5.2.3.2 General Loop Selection and Interchange 223
5.3
Scalar Expansion
226
5.4
Scalar and Array Renaming
236
5.5
Node Splitting
244
5.6
Recognition of Reductions
247
5.7
Index-set Splitting
251
5.7.1 Threshold Analysis 251
5.7.2 Loop Peeling 253
5.7.3 Section-based Splitting 254
5.8
Run-time Symbolic Resolution
256
5.9
Loop Skewing
258
5.10
Putting It All Together
263
5.11
Complications of Real Machines
270
5.12
Chapter Summary
274
5.13
Case Studies
275
5.14
Historical Comments and References
280
5.15
Exercises
281
5.16
References
282
CHAPTER 6
Creating Coarse-Grained Parallelism
285
6.1
Introduction
285
6.2
Single-Loop Methods
287
6.2.1 Privatization 287
6.2.2 Loop Distribution 292
6.2.3 Alignment 293
6.2.4 Code Replication 297
6.2.5 Loop Fusion 302
6.2.5.1 Typed Fusion 308
6.2.5.2 Unordered and Ordered Typed Fusion 316
6.2.5.3 Cohort Fusion 318
6.3
Perfect Loop Nests
320
6.3.1 Loop Interchange 320
6.3.2 Loop Selection 325

Chapter Draft of February 8, 2001
7
6.3.3 Loop Reversal 328
6.3.4 Loop Skewing 329
6.3.5 Unimodular Transformations 334
6.3.6 Profitability-Based Methods 335
6.4
Imperfectly Nested Loops
338
6.4.1 Multilevel Loop Fusion 339
6.4.2 A Parallel Code Generation Algorithm 342
6.5
An Extended Example
347
6.6
Packaging of Parallelism 350
6.6.1 Strip Mining 351
6.6.2 Pipeline Parallelism 352
6.6.3 Scheduling Parallel Work 355
6.6.4 Guided Self-Scheduling 357
6.7
Chapter Summary
360
6.8
Case Studies 361
6.9
Historical Comments and References
367
6.10
Exercises
368
6.11
References
369
CHAPTER 7
Control Dependence
373
7.1 Introduction
373
7.2
If Conversion
375
7.2.1 Definition 376
7.2.2 Branch Classification 377
7.2.3 Forward Branches 378
7.2.4 Exit Branches 382
7.2.5 Backward Branches 388
7.2.6 Complete Forward Branch Removal 391
7.2.7 Simplification 394
7.2.8 Iterative Dependences 399
7.2.9 IF Reconstruction 405
7.3 Control Dependence 406
7.3.1 Constructing Control Dependence 409
7.3.2 Control Dependence in Loops 412
7.3.3 An Execution Model for Control Dependences 413
7.3.4 Application of Control Dependence to Parallelization 417
7.3.4.1 Control Dependence and Transformations 417
7.3.4.2 Generating Code 424
7.4 Chapter Summary 434
剩余832页未读,继续阅读















安全验证
文档复制为VIP权益,开通VIP直接复制

评论8