flex快速词法分析器生成器手册

需积分: 9 34 浏览量更新于2024-07-16 收藏 86KB PDF 举报

"flex是一个快速的词法分析器生成器，用于创建能对文本进行模式匹配的程序。它使用扩展的正则表达式，并提供了丰富的功能来控制输入源、处理多输入缓冲区、上下文感知扫描以及与yacc等其他工具的接口。本手册包括教程和参考两部分，涵盖了从基本概念到高级特性的全面介绍。" 正文: `flex` 是一个广泛使用的开源工具，用于生成词法分析器（也称为扫描器）。词法分析器是编译器或解析器的第一阶段，负责识别输入源代码中的标记或符号。`flex` 的设计目标是提供高效、灵活的词法分析器生成，使得开发者可以专注于定义语言的规则，而不是底层的实现细节。 1. **简介** `flex` 的工作原理是读取用户定义的输入文件（通常以`.l`为扩展名），这个文件包含了模式匹配规则和关联的动作。`flex` 根据这些规则生成一个C源代码文件，该文件可以被编译成可执行的词法分析器。 2. **输入文件格式** 输入文件由一系列规则组成，每个规则包含一个模式和一个或多个动作。模式是由正则表达式组成的，用来匹配输入字符串；动作是在匹配成功后执行的C代码片段。 3. **扩展正则表达式** `flex` 使用一套扩展的正则表达式，包括了基本的字符集、量词、分组和引用等，允许用户定义复杂的匹配模式。 4. **匹配过程** `flex` 的匹配过程遵循优先级和最长匹配原则。当多个模式可以匹配输入时，会选择优先级更高且匹配长度更长的那个。 5. **动作** 动作可以是简单的打印语句，也可以是调用其他函数。当模式匹配成功时，对应的动作会被执行，从而实现自定义的行为。 6. **生成的扫描器** `flex` 生成的扫描器具有自动处理输入流的能力，支持重新定位输入源、处理多输入缓冲区，甚至可以在内存字符串上进行扫描。 7. **上下文条件（Start Conditions）** 上下文条件允许在词法分析器中引入上下文感知，通过定义不同的“状态”来改变匹配行为，这在处理嵌套结构或多元语法的场景中特别有用。 8. **多输入缓冲区** 对于需要从多个输入源读取数据的应用，`flex` 提供了管理多输入缓冲区的功能，可以在不同输入源之间切换。 9. **结束符规则** 特殊的结束符规则处理输入的结尾，比如在没有找到预期的结束标记时的错误处理。 10. **宏和可用值** 宏和变量可以在动作中使用，提供诸如行号、输入位置、匹配的文本等信息，方便在词法分析阶段进行错误报告和处理。 11. **与yacc的接口** `flex` 可以很好地与解析器生成器`yacc`（或其现代版本`bison`）配合使用，将词法分析器的输出传递给解析器，实现完整的编译器前端。 `flex` 是构建解析器和编译器的强大工具，它的灵活性和易用性使得开发者能够快速地构建出高效的词法分析器，以满足各种编程语言和数据解析的需求。通过深入理解`flex` 的使用和特性，开发者可以有效地提高开发效率并优化代码质量。

OW THE INPUT IS MATCHED

When the generated scanner is run, it analyzes its input

looking for strings which match any of its patterns. If it

finds more than one match, it takes the one matching the

most text (for trailing context rules, this includes the

length of the trailing part, even though it will then be

returned to the input). If it finds two or more matches of

the same length, the rule listed first in the flex input

file is chosen.

Once the match is determined, the text corresponding to the

match (called the token) is made available in the global

character pointer yytext, and its length in the global

integer yyleng. The action corresponding to the matched pat-

tern is then executed (a more detailed description of

actions follows), and then the remaining input is scanned

for another match.

If no match is found, then the default rule is executed: the

next character in the input is considered matched and copied

to the standard output. Thus, the simplest legal flex input

is:

which generates a scanner that simply copies its input (one

character at a time) to its output.

Note that yytext can be defined in two different ways:

either as a character pointer or as a character array. You

can control which definition flex uses by including one of

the special directives %pointer or %array in the first

(definitions) section of your flex input. The default is

%pointer, unless you use the -l lex compatibility option, in

which case yytext will be an array. The advantage of using

%pointer is substantially faster scanning and no buffer

overflow when matching very large tokens (unless you run out

of dynamic memory). The disadvantage is that you are res-

tricted in how your actions can modify yytext (see the next

section), and calls to the unput() function destroys the

present contents of yytext, which can be a considerable

porting headache when moving between different lex versions.

The advantage of %array is that you can then modify yytext

to your heart’s content, and calls to unput() do not destroy

yytext (see below). Furthermore, existing lex programs

sometimes access yytext externally using declarations of the

form:

extern char yytext[];

This definition is erroneous when used with %pointer, but

correct for %array.

%array defines yytext to be an array of YYLMAX characters,

which defaults to a fairly large value. You can change the

size by simply #define’ing YYLMAX to a different value in

the first section of your flex input. As mentioned above,

with %pointer yytext grows dynamically to accommodate large

tokens. While this means your %pointer scanner can accommo-

date very large tokens (such as matching entire blocks of

comments), bear in mind that each time the scanner must

resize yytext it also must rescan the entire token from the

beginning, so matching such tokens can prove slow. yytext

presently does not dynamically grow if a call to unput()

results in too much text being pushed back; instead, a run-

time error results.

Also note that you cannot use %array with C++ scanner

classes (the c++ option; see below).

CTIONS

Each pattern in a rule has a corresponding action, which can

be any arbitrary C statement. The pattern ends at the first

non-escaped whitespace character; the remainder of the line

is its action. If the action is empty, then when the pat-

tern is matched the input token is simply discarded. For

example, here is the specification for a program which

deletes all occurrences of "zap me" from its input:

"zap me"

(It will copy all other characters in the input to the out-

put since they will be matched by the default rule.)

Here is a program which compresses multiple blanks and tabs

down to a single blank, and throws away whitespace found at

the end of a line:

[ \t]+ putchar( ’ ’ );

[ \t]+$ /* ignore this token */

If the action contains a ’{’, then the action spans till the

balancing ’}’ is found, and the action may cross multiple

lines. flex knows about C strings and comments and won’t be

fooled by braces found within them, but also allows actions

to begin with %{ and will consider the action to be all the

text up to the next %} (regardless of ordinary braces inside

the action).

An action consisting solely of a vertical bar (’|’) means

"same as the action for the next rule." See below for an

illustration.

Actions can include arbitrary C code, including return

statements to return a value to whatever routine called

yylex(). Each time yylex() is called it continues processing

tokens from where it last left off until it either reaches

the end of the file or executes a return.

Actions are free to modify yytext except for lengthening it

(adding characters to its end--these will overwrite later

characters in the input stream). This however does not

apply when using %array (see above); in that case, yytext

may be freely modified in any way.

剩余48页未读，继续阅读

Losk-x

粉丝: 21
资源: 2

flex快速词法分析器生成器手册

【图像压缩】基于matlab GUI Haar小波变换图像压缩（含PSNR）【含Matlab源码 9979期】.zip

【胎心率监测器】基于matlab FastICA胎儿心跳信号噪声消除【含Matlab源码 9973期】.zip

ATA Command Set -5 (ACS-5).pdf

白色大气风格响应式产品展示企业网页模板.zip

Python实现简单自动点餐程序

白色大气风格的境外游景区模板下载.zip

白色大气风格的商业模板下载.zip

华豫佰佳组合促销视图.sql

白色创意风格的室内装修设计CSS3模板.zip

platform-tools-latest-darwin.zip

最新资源