逐步构建编译器：从解释器到汇编代码

需积分: 9 99 浏览量更新于2024-09-30 收藏 582KB PDF 举报

"An Incremental Approach to Compiler Construction" 这篇论文"An Incremental Approach to Compiler Construction"由Abdulaziz Ghuloum撰写，来自印第安纳大学计算机科学系，旨在打破新手编译器开发者面临的障碍，让他们了解编译器构建并非遥不可及。通常，编译器被视作精心制作的魔法工具，由专家们打造，对初学者来说难以理解。传统的编译器书籍往往过于专业，不适合初学者。而现实中的编译器复杂度使得它们不适合作为教学工具，这导致初学者往往选择编写解释器作为入门。论文的核心观点是，构建一个编译器可以像构建解释器一样简单。作者通过构建一个能处理Scheme编程语言大型子集的编译器，并生成适用于Intel x86架构的汇编代码来证明这一点。Intel x86是个人计算机领域的主要架构。编译器的开发过程被分解成多个小的增量步骤。每一步都会生成一个功能完整的编译器，用于处理越来越复杂的程序。这种逐步递增的方法允许初学者逐步理解编译器的工作原理，而不是一开始就面对一个庞大的、难以理解的整体项目。在每个增量阶段，作者详细介绍了如何添加新的语言特性，如变量声明、控制结构（如条件语句和循环）、函数定义和调用等。这些逐步增加的功能使编译器从处理基本的语法结构开始，逐渐演变为能处理更高级别的编程概念。同时，论文也会探讨如何实现词法分析、语法分析、语义分析以及代码生成等关键编译器组件。此外，作者可能还讨论了错误处理和优化技术，这些都是实际编译器中不可或缺的部分。通过这种方式，初学者可以逐步掌握编译器设计的关键概念，如抽象语法树（AST）的构建、中间代码生成以及目标代码优化。 "An Incremental Approach to Compiler Construction"提供了一个实用且可访问的途径，让初学者能够逐步构建一个功能完备的编译器，从而缩小理论与实践之间的差距，使得编译器构造不再那么神秘，而是成为一项可以通过逐步学习和实践掌握的技能。

A large subset of Scheme’s core forms (lambda, quote, set!,

etc) and extended forms (cond, case, letrec, internal define

etc.) must be supported by the compiler. Although most of these

forms are not essential, their presence allows us to write our pro-

grams in a more natural way. In implementing the extended forms,

we show how a large number of syntactic forms can be added with-

out changing the core language that the compiler supports.

A large collection of primitives (cons, car, vector?, etc.)

and library procedures (map, apply, list->vector, etc.) need

to be implemented. Some of these library procedures can be im-

plemented directly, while others require some added support from

the compiler. For example, some of the primitives cannot be im-

plemented without supporting variable-arity procedures, and others

require the presence of apply. Implementing a writer and a reader

requires adding a way to communicate with an external run-time

system.

3. Writing a Compiler in 24 Small Steps

Now that we described the development methodology, we turn our

attention to the actual steps taken in constructing a compiler. This

section is a brief description of 24 incremental stages: the ﬁrst is a

small language composed only of small integers, and the last covers

most of the requirements of R

RS. A more detailed presentation of

these stages is in the accompanying extended tutorial.

3.1 Integers

The simplest language that we can compile and test is composed

of the ﬁxed-size integers, or ﬁxnums. Let’s write a small compiler

that takes a ﬁxnum as input and produces a program in assembly

that returns that ﬁxnum. Since we don’t know yet how to do that,

we ask for some help from another compiler that does know: gcc.

Let’s write a small C function that returns an integer:

int scheme_entry(){

return 42;

}

Let’s compile it using gcc -O3 --omit-frame-pointer -S

test.c and see the output. The most relevant lines of the output

ﬁle are the following:

1. .text

2. .p2align 4,,15

3. .globl scheme_entry

4. .type scheme_entry, @function

5. scheme_entry:

6. movl $42, %eax

7. ret

Line 1 starts a text segment, where code is located. Line 2 aligns

the beginning of the procedure at 4-byte boundaries (not important

at this point). Line 3 informs the assembler that the scheme entry

label is global so that it becomes visible to the linker. Line 4

says that scheme entry is a function. Line 5 denotes the start of

the scheme entry procedure. Line 6 sets the value of the %eax

the received value to be in the %eax register.

Generating this ﬁle from Scheme is straightforward. Our com-

piler takes an integer as input and prints the given assembly with

the input substituted in for the value to be returned.

(define (compile-program x)

(emit "movl $~a, %eax" x)

(emit "ret"))

To test our implementation, we write a small C run-time system

that calls our scheme entry and prints the value it returns:

/* a simple driver for scheme_entry */

#include <stdio.h>

int main(int argc, char** argv){

printf("%d\n", scheme_entry());

return 0;

}

3.2 Immediate Constants

Values in Scheme are not limited to the ﬁxnum integers. Booleans,

characters, and the empty list form a collection of immediate val-

ues. Immediate values are those that can be stored directly in

a machine word and therefore do not require additional storage.

The types of the immediate objects in Scheme are disjoint, conse-

quently, the implementation cannot use ﬁxnums to denote booleans

or characters. The types must also be available a t run time to al-

low the driver to print the values appropriately and to allow us to

provide the type predicates (discussed in the next step).

One way of encoding the type information is by dedicating some

of the lower bits of the machine word for type information and

using the rest of the machine word for storing the value. Every type

of value is deﬁned by a mask and a tag. The mask deﬁnes which bits

of the integer are used for the type information and the tag deﬁnes

the value of these bits.

For ﬁxnums, the lower two bits (mask = 11

) must be 0

(tag = 00

). This leaves 30 bits to hold the value of a ﬁxnum.

Characters are tagged with 8 bits ( tag = 00001111

) leaving 24

bits for the value (7 of which are actually used to encode the ASCII

characters). Booleans are given a 7-bit tag (tag = 0011111

), and

1-bit value. The empty list is given the value 00101111

We extend our compiler to handle the immediate types appro-

priately. The code generator must convert the different immediate

values to the corresponding machine integer values.

(define (compile-program x)

(define (immediate-rep x)

(cond

((integer? x) (shift x fixnum-shift))

...))

(emit "movl $~a, %eax" (immediate-rep x))

(emit "ret"))

The driver must also be extended to handle the newly-added

values. The following code illustrates the concept:

#include <stdio.h>

#define fixnum_mask 3

#define fixnum_tag 0

#define fixnum_shift 2

...

int main(int argc, char** argv){

int val = scheme_entry();

if((val & fixnum_mask) == fixnum_tag){

printf("%d\n", val >> fixnum_shift);

} else if(val == empty_list){

printf("()\n");

} ...

return 0;

}

3.3 Unary Primitives

We extend the language now to include calls to primitives that ac-

cept one argument. We start with the simplest of these primitives:

add1 and sub1. To compile an expression in the form (add1 e),

we ﬁrst emit the code for e. That code would evaluate e placing its

value in the %eax register. What remains to be done is incrementing

Scheme and Functional Programming, 2006 29

剩余10页未读，继续阅读

andyhu1007

粉丝: 0
资源: 1

逐步构建编译器：从解释器到汇编代码

An Incremental Approach to Compiler Construction.rar

Scheme_Compiler:基于 Abdulaziz Ghuloum 发表的论文“An Incremental Approach to Compiler Construction”

YOLOv3 An Incremental Improvement.docx

YOLOv3: An Incremental Improvement论文全文翻译

【论文阅读笔记】YOLOv3: An Incremental Improvement

An Incremental Tuning Method Based on Ultraconservative Update for Statistical Machine Translation

Incremental semi-supervised kernel construction with self-organizing incremental neural network and application in intrusion detection

An Effective Incremental Algorithm for Adaptive Physical Carrier Sensing in WSNs.pdf

Model-based incremental conformance checking to enable interactive product configuration

Incremental growth of an array, revisited: Efficient dynamic growth of an array by concatenation.-matlab开发

最新资源