-6-
5 Notational Conventions
5.1 Syntactic and Lexical Grammars
This section describes the context-free grammars used in this specification to define the lexical and
syntactic structure of an ECMAScript program.
5.1.1 Context-Free Grammars
A context-free grammar consists of a number of productions. Each production has an abstract symbol
called a nonterminal as its left-hand side, and a sequence of zero or more nonterminal and terminal
symbols as its right-hand side. For each grammar, the terminal symbols are drawn from a specified
alphabet.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given
context-free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of
terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-
hand side of a production for which the nonterminal is the left-hand side.
5.1.2 The Lexical and RegExp Grammars
A lexical grammar for ECMAScript is given in clause 7. This grammar has as its terminal symbols the
characters of the Unicode character set. It defines a set of productions, starting from the goal symbol
InputElementDiv or InputElementRegExp, that describe how sequences of Unicode characters are
translated into a sequence of input elements.
Input elements other than white space and comments form the terminal symbols for the syntactic
grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words,
identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although
not considered to be tokens, also become part of the stream of input elements and guide the process of
automatic semicolon insertion (7.8.5). Simple white space and single-line comments are discarded and
do not appear in the stream of input elements for the syntactic grammar. A MultiLineComment (that is, a
comment of the form “/*…*/” regardless of whether it spans more than one line) is likewise simply
discarded if it contains no line terminator; but if a MultiLineComment contains one or more line
terminators, then it is replaced by a single line terminator, which becomes part of the stream of input
elements for the syntactic grammar.
A RegExp grammar for ECMAScript is given in 15.10. This grammar also has as its terminal symbols
the characters of the Unicode character set. It defines a set of productions, starting from the goal symbol
Pattern, that describe how sequences of Unicode characters are translated into regular expression
patterns.
Productions of the lexical and RegExp grammars are distinguished by having two colons “::”as
separating punctuation. The lexical and RegExp grammars share some productions.
5.1.3 The Numeric String Grammar
A second grammar is used for translating strings into numeric values. This grammar is similar to the part
of the lexical grammar having to do with numeric literals and has as its terminal symbols the characters
of the Unicode character set. This grammar appears in 9.3.1.
Productions of the numeric string grammar are distinguished by having three colons “:::”as
punctuation.
5.1.4 The Syntactic Grammar
The syntactic grammar for ECMAScript is given in clauses 11, 12, 13 and 14. This grammar has
ECMAScript tokens defined by the lexical grammar as its terminal symbols (5.1.2). It defines a set of
productions, starting from the goal symbol Program, that describe how sequences of tokens can form
syntactically correct ECMAScript programs.
When a stream of Unicode characters is to be parsed as an ECMAScript program, it is first converted to
a stream of input elements by repeated application of the lexical grammar; this stream of input elements
is then parsed by a single application of the syntax grammar. The program is syntactically in error if the
tokens in the stream of input elements cannot be parsed as a single instance of the goal nonterminal
Program, with no tokens left over.