Two op
erators allowed in a character class are the hyphen (“-”) and circumflex (“^”). When used
between two characters the hyphen represents a range of characters. The circumflex, when used
as the first character, negates the expression. If two patterns match the same string, the longest
match wins. In case both matches are the same length, then the first pattern listed is used.
... definitions ...
%%
... rules ...
%%
... subroutines ...
Input to Lex is divided into three sections with %% dividing the sections. This is best illustrated by
example. The first example is the shortest possible lex file:
%%
Input is copied to output one character at a time. The first %% is always required, as there must
always be a rules section. However if we don’t specify any rules then the default action is to
match everything and copy it to output. Defaults for input and output are stdin and stdout,
respectively. Here is the same example with defaults explicitly coded:
%%
/* match everything except newline */
. ECHO;
/* match newline */
\n ECHO;
%%
int yywrap(void) {
return 1;
}
int main(void) {
yylex();
return 0;
}
Two patterns have been specified in the rules section. Each pattern must begin in column one.
This is followed by whitespace (space, tab or newline) and an optional action associated with the
pattern. The action may be a single C statement, or multiple C statements, enclosed in braces.
Anything not starting in column one is copied verbatim to the generated C file. We may take
advantage of this behavior to specify comments in our lex file. In this example there are two
patterns, “.” and “\n”, with an ECHO action associated for each pattern. Several macros and
variables are predefined by lex. ECHO is a macro that writes code matched by the pattern. This is
the default action for any unmatched strings. Typically, ECHO is defined as:
#define ECHO fwrite(yytext, yyleng, 1, yyout)
Variable yytext is a pointer to the matched string (NULL-terminated) and yyleng is the length of
the matched string. Variable yyout is the output file and defaults to stdout. Function yywrap is
called by lex when input is exhausted. Return 1 if you are done or 0 if more processing is
required. Every C program requires a main function. In this case we simply call yylex that is the
main entry-point for lex. Some implementations of lex include copies of main and yywrap in a
library thus eliminating the need to code them explicitly. This is why our first example, the shortest
lex program, functioned properly.
8