not completely obvious. These restrictions and their mo-
tivation are described as they arise. Figure 3 is the ASDL
description of a trivial programming language.
2.1 Lexical Issues
upper =
"A"
| ... |
"Z"
lower =
"a"
| ... |
"z"
alpha =
"_"
| upper | lower
alpha num = alpha |
"0"
| ... |
"9"
typ id = lower {alpha num}
con id = upper {alpha num}
id = typ id | con id
Figure 2: Lexical structure
Figure 2 is a description of the lexical structure of to-
kens used in the ASDL grammar in Figure 1. The names
of constructors and types in the description contain in-
formal semantic information that should be preserved
by a tool when translating descriptions into implemen-
tations. To keep the mapping from ASDL names to tar-
get language names simple, the names of types and con-
structors are restricted to the intersection of valid iden-
tifiers in the initial set of target languages. To help the
reader distinguish between types and constructor names,
types are required to begin with a lower case letter and
constructor names must begin with an upper case let-
ter. Rather than restricting ASDL names to exclude the
union of keywords in all target language, ASDL tools
will have to keep track and correct conflicts between
target language keywords and the type and constructor
names.
ASN.1 has a similar restrictions. However, the ASN.1
equivalent of ASDL types must begin with an upper case
letter, and non-type identifiers must begin with a lower
case letter. The ASN.1 restrictions are incompatible with
many common stylistic conventions in ML, Java, C++,
and C. For example, enumerated constants in ASN.1
must begin with a lowercase letter, but C style languages
conventionally use all uppercase identifiers for enumer-
ated constants.
2.2 ASDL Fundamentals
An ASDL description consists of three fundamental
constructs: types, constructors, and productions. A type
is defined by productions that enumerate the construc-
tors for that type. In Figure 3 the first production de-
scribes a stm type. A value of the stm type is created by
one of three different constructors Compound, Assign,
and Print. Each of these constructors has a sequence of
stm = Compound(stm, stm)
| Assign(identifier, exp)
| Print(exp list)
exp list = ExpList(exp, exp list) | Nil
exp = Id(identifier)
| Num(int)
| Op(exp, binop, exp)
binop = Plus | Minus | Times | Div
Figure 3: Simple ASDL description
fields that describe the type of values associated with a
constructor.
The Compound constructor has two fields whose val-
ues are of type stm. One can interpret the production as
defining the structure of stm trees which can have three
different kinds of nodes Compound, Assign,andPrint
where the Compound node has two children that are
subtrees that have the structure of a stm tree.
Notice that the binop type consists of only construc-
tors which have no fields. Types like binop are therefore
finite enumerations of values. Tools can easily recog-
nize this and represent these types as enumerations in
the target language. ASDL does not provide an explicit
enumeration type, unlike ASN.1 and the various IDLs.
Tools should recognize this idiom and use an appropri-
ate encoding.
There are three primitive pre-defined types in ASDL.
Figure 3 uses two of them int and identifier.Theint
type represents signedintegersof infinite precision. Spe-
cific tools may choose to produce language interfaces
that represent them as integers of finite precision. These
language interfaces should appropriately signal an error
when they are unable to represent such a value during
unpickling. The identifier type is analogousto Lisp sym-
bols. ASDL also provides a primitive string type.
2.3 Generating Code from ASDL Descriptions
From the definitions in Figure 3, it is easy to auto-
matically generate data type declarations in target lan-
guages such as C, C++, Java, and ML. For languages
like C, each type is represented as a tagged union of val-
ues. Languages like Java and C++ have a single abstract
base class for each type and concrete subclasses of the
base class for each variant of the type.
Figure 4 shows one way to translate the stm type into
C. Each ASDL type is represented as a pointer to a struc-
ture. The structure contains a “kind” tag that indicates
which variant of the union the current value holds. It is
also convenient to have functions that allocate space and
properly initialize the different variants of stm. Notice
that the binop is translated as an enumeration.
3