3 A simple Example: How to work with JFlex
The code included in %{...%} is copied verbatim into the generated lexer class source. Here
you can declare member variables and functions that are used inside scanner actions. In our
example we declare a StringBuffer “string” in which we will store parts of string literals and
two helper functions “symbol” that create java cup.runtime.Symbol objects with position
information of the current token (see section 8.1 JFlex and CUP for how to interface with the
parser generator CUP). As JFlex options, both %{ and \%} must begin a line.
The specification continues with macro decl arations. Macros are abbreviations for regular
expressions, used to make lexical specifications easier to read and understand. A macro
declaration consists of a macro identifier followed by =, then followed by the regular expression
it represents. This regular expression may itself contain macro usages. Although this allows a
grammar like specification style, macros are still just abbreviations and not non terminals –
they cannot be recursive or mutually recursive. Cycles in macro definitions are detected and
reported at generation time by JFlex.
Here some of the example macros in more detail:
• LineTerminator stands for the regular expression that matches an ASCII CR, an ASCII
LF or an CR followed by LF.
• InputCharacter stands for all characters that are not a CR or LF.
• TraditionalComment is the expression that matches the string "/*" followed by a
character that is not a *, followed by anything that does not contain, but ends in "/*".
As this would not match comments like /****/, we add "/*" followed by an arbitrary
number (at least one) of "*" followed by the closing "/". This is not the only, b ut one
of the simpler expressions matching non-nesting Java comments. It is tempting to just
write something like the expression "/*" .* "*/", but this would match more than we
want. It would for instance match the whole of /* */ x = 0; /* */, instead of two
comments and four real tokens. See DocumentationComment and CommentContent for
an alternative.
• CommentContent matches zero or more occurrences of any character except a * or any
number of * followed by a character that is not a /
• Identifier matches each string that starts with a character of class jletter followed
by zero or more characters of class jletterdigit. jletter and jletterdigit are
predefined character classes. jletter includes all characters for which the Java function
Character.isJavaIdentifierStart returns true and jletterdigit all characters for
that Character.isJavaIdentifierPart returns true.
The last part of the second section in our lexi cal specification is a lexical state declaration:
%state STRING declares a lexical state STRING that can be used in the “lexical rules” part
of the specification. A state declaration is a line starting with %state followed by a space
or comma separated list of state identifiers. There can be more than one line starting with
%state.
10