ECMAScript Language Specification Edition 3 24-Mar-00
9
5 Notational Conventions
5.1 Syntactic and Lexical Grammars
This section describes the context-free grammars used in this specification to define the lexical and syntactic
structure of an ECMAScript program.
5.1.1 Context-Free Grammars
A context-free grammar consists of a number of productions. Each production has an abstract symbol called a
nonterminal as its left-hand side, and a sequence of zero or more nonterminal and terminal symbols as its right-
hand side. For each grammar, the terminal symbols are drawn from a specified alphabet.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-
free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of terminal symbols
that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for
which the nonterminal is the left-hand side.
5.1.2 The Lexical and RegExp Grammars
A lexical grammar for ECMAScript is given in section 7. This grammar has as its terminal symbols the characters of
the Unicode character set. It defines a set of productions, starting from the goal symbol InputElementDiv or
InputElementRegExp, that describe how sequences of Unicode characters are translated into a sequence of input
elements.
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for
ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and
punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also
become part of the stream of input elements and guide the process of automatic semicolon insertion (section 7.8.5).
Simple white space and single-line comments are discarded and do not appear in the stream of input elements for
the syntactic grammar. A MultiLineComment (that is, a comment of the form “/*…*/” regardless of whether it
spans more than one line) is likewise simply discarded if it contains no line terminator; but if a MultiLineComment
contains one or more line terminators, then it is replaced by a single line terminator, which becomes part of the
stream of input elements for the syntactic grammar.
A RegExp grammar for ECMAScript is given in section 15.10. This grammar also has as its terminal symbols the
characters of the Unicode character set. It defines a set of productions, starting from the goal symbol Pattern, that
describe how sequences of Unicode characters are translated into regular expression patterns.
Productions of the lexical and RegExp grammars are distinguished by having two colons “::” as separating
punctuation. The lexical and RegExp grammars share some productions.
5.1.3 The Numeric String Grammar
A second grammar is used for translating strings into numeric values. This grammar is similar to the part of the
lexical grammar having to do with numeric literals and has as its terminal symbols the characters of the Unicode
character set. This grammar appears in section 9.3.1.
Productions of the numeric string grammar are distinguished by having three colons “:::” as punctuation.
5.1.4 The Syntactic Grammar
The syntactic grammar for ECMAScript is given in sections 11, 12, 13 and 14. This grammar has ECMAScript
tokens defined by the lexical grammar as its terminal symbols (section 5.1.2). It defines a set of productions,
starting from the goal symbol Program, that describe how sequences of tokens can form syntactically correct
ECMAScript programs.
When a stream of Unicode characters is to be parsed as an ECMAScript program, it is first converted to a stream of
input elements by repeated application of the lexical grammar; this stream of input elements is then parsed by a
single application of the syntax grammar. The program is syntactically in error if the tokens in the stream of input
elements cannot be parsed as a single instance of the goal nonterminal Program, with no tokens left over.