编译器后端优化关键：详尽的指令选择调研

需积分: 9 12 浏览量更新于2024-07-18 收藏 1.04MB PDF 举报

指令选择是编译器后端优化问题中的关键环节，它位于代码生成器的核心部分。该过程涉及将输入程序从目标无关的表示形式转换为特定机器码，以最大程度地利用可用的指令集，从而实现高效代码生成。指令选择算法需要考虑诸多因素，如指令的性能、内存访问模式、分支预测以及处理器特性的兼容性。在《指令选择：一次详尽且现代的文献综述》（Survey on Instruction Selection: An Extensive and Modern Literature Review）一文中，作者Gabriel Hjort Blindell，来自瑞典皇家理工学院的信息与通信技术学院，探讨了这一主题的最新进展和技术挑战。论文旨在系统梳理和评估当前指令选择方法的理论基础、算法设计、优化策略以及相关研究进展，以便于开发者理解和改进现代编译器的性能。文章可能涵盖了以下要点： 1. 历史背景 - 提供了指令选择问题的发展历程，从早期的传统方法到现代高级优化技术的演变。 2. 理论模型 - 可能讨论了指令集架构（ISA）特征分析、控制流分析、数据依赖分析等用于指导选择的理论模型。 3. 搜索策略 - 分析了不同搜索算法，如贪心算法、动态规划、遗传算法或基于机器学习的方法，以及它们在实际应用中的效果。 4. 分支预测与调度 - 指出如何通过优化分支预测和指令流水线调度来提升指令选择的效率。 5. 并行与向量化 - 讨论了如何利用多核处理器和SIMD指令集进行指令并行化和向量化处理。 6. 硬件加速 - 考察了利用硬件辅助功能，如预测单元、分支预测缓存等，对指令选择的影响。 7. 基准测试与评估 - 提及了常用的评测方法和基准，用来衡量不同指令选择策略的实际性能表现。 8. 挑战与未来方向 - 阐述了当前面临的限制，如适应不断变化的硬件环境、处理新兴技术（如量子计算）带来的新问题，以及潜在的研究趋势。通过阅读这篇综述，读者可以了解到指令选择领域的最新研究成果和实践，这对于编译器的设计者、优化工程师以及对高性能计算感兴趣的开发者来说是一份宝贵的参考资料。同时，论文强调了指令选择在现代软件开发中的重要性，尤其是在追求低延迟、高吞吐量和能源效率的现代计算机体系结构中。

interpretive mode. Hence the overhead of compilation must be kept to a minimum.

Although more powerful code generation techniques have started to appear, they

were typically too slow and expensive to be used in such systems. This made

macro expansion a viable option as such instruction selectors are small, fast, and

produces code which, despite its inferior quality, still performed far better than

its interpreted counterpart. A few examples, ranging from 1983 to 1999, include:

an implementation of

Smalltalk-80

by Deutsch and Schiﬀman [64]; Omniware –

a mobile execution platform akin to Java – by Adl-Tabatabai et al. [1]; Vcode by

Engler [77]; and Gburg by Fraser and Proebsting [97].

However, as CPU power and memory resources have grown aplenty over the

years, even compilers in dynamic code environments have transitioned to other

techniques such as tree covering (we will cover such approaches in the next chapter).

2.3 Summary

In this chapter we have discussed the earliest approaches to instruction selection.

These rely on the notion of macro expansion which makes them relatively easy to

implement and retarget while at the same time yielding small, fast, and resource-

eﬃcient instr uction selectors. Furthermore, they produce eﬃcient code if the target

machine is such that there exists a one-to-one or one-to-many relation between IR

nodes and machine instructions. However, in cases where the relations are many-to-

one – which is very common – the quality of the generated code is typically quite

low due to the diﬃculty in applying complex machine instructions that implement

more than one IR operation. In Chapter 6 we will see how macro expansion

can be used in conjunction with peephole optimisers to surmount this limitation,

but purely macro expansion-based instruction selectors are rarely used in modern

compilers as they have been superseded by newer techniques.

To conclude, macro expansion is a reasonable choice in situations when code

quality need not be optimal or even very good. This can be usable in systems where

even mediocre machine code is better than no code (e.g. when JIT-ing dynamic

programs).

Tree Covering

One of the main limitations of macro expansion

Figure 3.1: Example of a tree

covering. Pattern instances are

indicated by the dashed lines

and shaded background that

enclose the nodes matched by

that pattern.

is that its scope is limited to a single programming

language constr uct or IR node. A more robust and

powerful approach becomes available if both the

input program and machine instructions are trans-

formed into graph-shaped representations. Since the

machine instructions will form much smaller graphs

than the input program, the problem of instruction

selection is thus reduced to ﬁnding a conﬁguration

of machine instructions such that they form a graph

equivalent to that of the program.

Let us assume that the input program can be

transformed into a set of trees, where the nodes

represent operations and the edges represent data

dependencies between operations. The direction of

the arrow in the edge indicates the direction of the

data ﬂow. Each tree can only contain the operations

that have an eﬀect on the end result (i.e. the root). Such trees are called data

ﬂow trees or often also expression trees. Operations that are independent from each

other form separate trees; this eﬀectively breaks the connection between the textual

arrangement of the input program and the order in which instruction selection is

performed and, subsequently, its impact on code quality.

Let us further assume that each instruction on the target machine can be trans-

formed into a corresponding tree that captures the behaviour of that instruction.

Such trees are called patterns, and there may exist more than one pattern tree per

machine instruction.

Then, the problem of instruction selection is reduced to ﬁnding and choosing

patterns such that every node in every expression tree is covered by some pattern.

This method is commonly known as tree covering. An example of a tree pattern

matching instance is illustrated in Figure 3.1. As already mentioned, the task is to

select a subset of the patterns such that every node in the tree is enclosed within

some pattern. If a node is covered by more than one pattern, then it means that

the operation which the node represents will be implemented by multiple machine

instructions and thus executed more times than necessary. This may be desired in

some instances – for example, it may be more eﬃcient to recompute a value instead

of storing and retrieving it at a later point (this is called rematerialisation). However,

most often it is optimal that each operation is only implemented once by a single

machine instruction. Hence the solutions of interest are typically restricted to those

where each node is covered by e xactly one pattern. Finding a valid subset of available

patterns is non-trivial as selection of one pattern may eliminate the possibility to

select other patterns. This problem is further exacerbated if the patterns apply some

additional constraints, which is not uncommon for irregular architectures such as

embedded systems and DSPs.

Hence, instruction selection by tree covering consists in fact of two orthogonal

subproblems:

Identify which machine instructions could potentially cover parts of, or the

whole, expression tree. This task is called pattern matching.

2. Select machine instr uctions such that:

i. each node is covered exactly once, and

ii. the total cost is minimal.

This task is thus called pattern selection.

As we will see, some approaches unify these two subproblems while others keep

them separate.

3.1 First attempts

Wasilew [232] and Weingart [233], published in 1972 and 1973, respectively, made

the ﬁrst attempts on devising code generators that operate using tree covering. Of

the two, only the latter will be discussed in this report due to lack of second-hand

information on the former. Weingart’s approach

is based on a single pattern tree,

called a discrimination net, which is derived from a declarative machine description.

Using a single tree, Weingart argued, allows for a compact and eﬃcient means

of representation. Then, the parsing process of building the AST is extended to

push the parse tokens onto a stack. As new tokens appear, the pattern tree is

progressively traversed by comparing the tokens against the child nodes of the

current tree patter n node. A match occurs when a leaf is hit, which then emits the

machine code corresponding to the matched pattern.

Although Weingart’s tree covering-based approach enables more complex pat-

tern matching than macro-expanding instruction selectors, the method proved to

be diﬃcult to apply in practice. First, as the code generator operates on the syntax

tree, all potential code sequences have to be encoded into the pattern tree. Hence

This section is primarily based on earlier surveys by Cattell [41] and Ganapathi et al. [108].

Based on second-hand information from [41, 108].

there is a risk that some sequence will not be matched by any machine instruction.

Although Weingart partly addressed this concern by introducing conversion pat-

terns, e.g. to move data items into registers suppor ted by the machine instructions.

However, no means was provided to determine whether the set of conversions is

suﬃcient. Second, the semantics of the machine description were adapted to ﬁt a

single target – PDP-11 – thus potentially excluding the modelling of other machines.

Third, the technique cannot handle situations where there is more than one match.

The last item is perhaps most detrimental since a core feature of instruction selec-

tion is to be able to make calculated choices in order to make the best use of the

machine instructions available.

Another approach was proposed by Newcomer [177],

who used means-end

analysis to drive the code generation process. Means-end analysis is a goal-directed,

recursive heuristic search (see Newell and Ernst [178]) where the method is to

always minimise the diﬀerence between the current state and a goal state. How-

ever, Newcomer’s approach has little practical application as it only deals with

arithmetic expressions and has limited machine description capabilities. Moreover,

like Weingart, it operates on syntax trees and thus suﬀers from similar drawbacks.

Lastly, the exhaustive search strategy is most likely too expensive to be applicable

in production-quality compilers.

Fraser [100]

developed a knowledge-based approach where the input program is

ﬁrst translated into a language called ISP’ and then matched against ad-hoc, machine-

speciﬁc rules, also written in ISP’. Little is known about this approach – again, I

was unable to get a hold of the dissertation – but when discussed in other papers it

is dismissed as inﬂexible and of poor performance.

Johnson [133] used some of Snyder’s ideas in his implementation of the well-

known portable C compiler (pcc). The code generator is centred around a tree-

rewriting matcher which attempts to match subtrees against machine instruction

templates. It also includes a resource allocator which is based on concepts from

Sethi and Ullman. Failure to ﬁnd a match results in a subtree transformation using

machine-dependent rewrite rules, which can potentially result in inﬁnite looping.

Other drawbacks include the tight coupling with the C programming language

which makes it diﬃcult to adapt it to other programming languages. Moreover,

code selection decisions are distributed across several phases in the code generator,

thus compounding retargetability.

3.2 Linearised, syntactic parsing

A common ﬂaw amongst these ﬁrst attempts is that they (i) still lack a formal

methodology, and (ii) tend to be relatively slow. In contrast, syntax analysis –

which rely on string parsing techniques such as LL, LR, SLR, and LALR par sing –

is one of the bes t understood problems of modern code compilation. In addition,

Based on second-hand information from [41, 108].

Based on second-hand information from [108].

they are very fast as they are completely table-driven. Fortunately, as we will see,

the same ideas can also be applied to drive instruction selection.

In 1978 Glanville and Graham [111] presented a seminal

∗

⇓

+ ∗ a b c

Figure 3.2: Linear-

ising a tree.

paper that descr ibes how the same, syntactic techniques for ad-

dressing syntax parsing can also be adapted to pattern matching

and selection. (This was also already hinted at, albeit vaguely,

by Feldman and Gries in 1968 [87, p. 107].) Glanville and Gra-

ham recognised that by linearising the trees using Polish preﬁx

notation (i.e.

1 + (2 + 3)

is expressed as

+ 1 + 2 3

, thus

making parentheses obsolete; also see Figure 3.2), the machine

instructions can be expressed as a set of grammatical production

rules. One such set is given in Figure 3.3). With a modiﬁed LR(1)

parser, this can be used to drive instruction selection. We will

refer to this technique as the Glanville-Graham approach.

In a rough sketch, the approach works as follows. The program string is

progressively parsed and converted into the tokens which are pushed onto a s tack.

As each token is pushed, the algorithm makes a decision whether to shift (continue)

or to reduce (pop tokens from the stack). Which decision to make is deduced

from a precomputed table that has been generated from the grammar. Figure 3.4

shows the table computed from the grammar in Figure 3.3. A reduction can be

performed if there exists some rule that matches the top-most portion of the stack.

This corresponds to the pattern matching task. During reduction, the matched

items on the stack will be popped and replaced with the left-hand symbol of the

selected rule. Hence this corresponds to the pattern selection task. This process of

shif ting and reducing continues until the entire program has been parsed, hopefully

ending up in an accept state via a start symbol. A walk-through of such an execution

is given in Figure 3.5.

There are several key diﬀerences between a regular syntax parser and Glanville

and Graham’s algorithm. First, the instruction set grammar for a target machine is

normally highly ambiguous. This incurs many shift-reduce and reduce-reduce con-

ﬂicts which have to be resolved in some manner. Glanville and Graham addressed

shift-reduce conﬂicts by always opting for a shift in these situations. The eﬀect is that

the instruction selector will attempt to select as large patterns as possible, which is

most often the desired outcome. The idea of always selecting the largest possible

pattern is commonly known as maximum munching, or just maximum munch,

which was coined by Cattell [42] in his PhD thesis. Furthermore, reduce-reduce

conﬂicts are resolved with a simple heuristic that chooses the rule with the longest

right-hand side production. In equal-length conﬂicts the heuristic selects the ﬁrst

rule deﬁned in the grammar.

Second, the reduce step is considerably more complicated as it needs to consider

both syntactic and semantic information. For instance, when a shift is performed

on an input symbol

, information about the register which

represents is also

pushed on the stack. This is necessary in order to emit the machine instructions. In

addition, Glanville and Graham chose to incorporate register allocation into this

step.

剩余108页未读，继续阅读

wuhui_gdnt

粉丝: 651
资源: 22

编译器后端优化关键：详尽的指令选择调研

Instruction selection

编译器设计之代码生成算法：Instruction Selection.rar

[MMS_051325]Elasped Time Add-on Instruction.rar

[MMS_047307]RSSql Handshaking Add-on Instruction.rar

[MMS_046508]MVI56-MCM (Add On Instruction (AOI)).rar

Compiler Optimization on VLIW Instruction Scheduling for Low Power笔记

SAP Online Test Application Instruction.docx

Impact of instruction on learning disabled students' creative thinking

Effect of location on proctoring in self-paced individualized instruction

Impact of instruction on behavior disordered and learning disabled students' creative behavior

最新资源