© Ecma International 2011
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given
context-free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of
terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand
side of a production for which the nonterminal is the left-hand side.
5.1.2 The Lexical and RegExp Grammars
A lexical grammar for ECMAScript is given in clause 7. This grammar has as its terminal symbols characters
(Unicode code units) that conform to the rules for SourceCharacter defined in Clause 6. It defines a set of
productions, starting from the goal symbol InputElementDiv or InputElementRegExp, that describe how
sequences of such characters are translated into a sequence of input elements.
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for
ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and
punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens,
also become part of the stream of input elements and guide the process of automatic semicolon insertion (7.9).
Simple white space and single-line comments are discarded and do not appear in the stream of input
elements for the syntactic grammar. A MultiLineComment (that is, a comment of the form ―/*…*/‖ regardless
of whether it spans more than one line) is likewise simply discarded if it contains no line terminator; but if a
MultiLineComment contains one or more line terminators, then it is replaced by a single line terminator, which
becomes part of the stream of input elements for the syntactic grammar.
A RegExp grammar for ECMAScript is given in 15.10. This grammar also has as its terminal symbols the
characters as defined by SourceCharacter. It defines a set of productions, starting from the goal symbol Pattern,
that describe how sequences of characters are translated into regular expression patterns.
Productions of the lexical and RegExp grammars are distinguished by having two colons ―::‖ as separating
punctuation. The lexical and RegExp grammars share some productions.
5.1.3 The Numeric String Grammar
Another grammar is used for translating Strings into numeric values. This grammar is similar to the part of the
lexical grammar having to do with numeric literals and has as its terminal symbols SourceCharacter. This
grammar appears in 9.3.1.
Productions of the numeric string grammar are distinguished by having three colons ―:::‖ as punctuation.
5.1.4 The Syntactic Grammar
The syntactic grammar for ECMAScript is given in clauses 11, 12, 13 and 14. This grammar has ECMAScript
tokens defined by the lexical grammar as its terminal symbols (5.1.2). It defines a set of productions, starting
from the goal symbol Program, that describe how sequences of tokens can form syntactically correct
ECMAScript programs.
When a stream of characters is to be parsed as an ECMAScript program, it is first converted to a stream of
input elements by repeated application of the lexical grammar; this stream of input elements is then parsed by
a single application of the syntactic grammar. The program is syntactically in error if the tokens in the stream
of input elements cannot be parsed as a single instance of the goal nonterminal Program, with no tokens left
over.
Productions of the syntactic grammar are distinguished by having just one colon ―:‖ as punctuation.
The syntactic grammar as presented in clauses 11, 12, 13 and 14 is actually not a complete account of which
token sequences are accepted as correct ECMAScript programs. Certain additional token sequences are also
accepted, namely, those that would be described by the grammar if only semicolons were added to the
sequence in certain places (such as before line terminator characters). Furthermore, certain token sequences
that are described by the grammar are not considered acceptable if a terminator character appears in certain
―awkward‖ places.