没有合适的资源?快使用搜索试试~ 我知道了~
首页Efficient String Matching : An Aid to Bibliographic Search
Efficient String Matching : An Aid to Bibliographic SearchEfficient String Matching : An Aid to Bibliographic SearchEfficient String Matching : An Aid to Bibliographic SearchEfficient String Matching : An Aid to Bibliographic SearchEfficient String Matching : An Aid to Bibliographic SearchEfficient String Matching : An Aid to Bibliographic SearchEfficient String Matching : An Aid to Bibliographic Search
资源详情
资源评论
资源推荐
Programming Glenn Manacher
Techniques Editor
Efficient
String Matching:
An Aid to
Bibliographic Search
Alfred V. Aho and Margaret J. Corasick
Bell Laboratories
This paper describes a simple, efficient algorithm to
locate all occurrences of any of a finite number of key-
words in a string of text. The algorithm consists of con-
structing a finite state pattern matching machine from the
keywords and then using the pattern matching machine
to process the text string in a single pass. Construction
of the pattern matching machine takes time proportional
to the sum of the lengths of the keywords. The number
of state transitions made by the pattern matching
machine in processing the text string is independent of
the number of keywords. The algorithm has been used to
improve the speed of a library bibliographic search pro-
gram by a factor of 5 to 10.
Keywords and Phrases: keywords and phrases, string
pattern matching, bibliographic search, information re-
trieval, text-editing, finite state machines, computational
complexity.
CR Categories: 3.74, 3.71, 5.22, 5.25
Copyright © 1975, Association for Computing Machinery, Inc.
General permission to republish, but not for profit, all or part of
this material is granted, provided that ACM's copyright notice is
given and that reference is made to this publication, to its date of
issue, and to the fact that reprinting privileges were granted by
permission of the Association for Computing Machinery.
Authors' present addresses: A. V. Aho, Bell Laboratories,
Murray Hill, N.J. 07974. M. J. Corasick, The MITRE Corporation,
Bedford, Mass. 01730.
333
1. Introduction
In many information retrieval and text-editing appli-
cations it is necessary to be able to locate quickly some or
all occurrences of user-specified patterns of words and
phrases in text. This paper describes a simple, efficient
algorithm to locate all occurrences of any of a finite
number of keywords and phrases in an arbitrary text
string.
The approach should be familiar to those acquainted
with finite automata. The algorithm consists of two parts.
In the first part we construct from the set of keywords a
finite state pattern matching machine; in the second part
we apply the text string as input to the pattern matching
machine. The machine signals whenever it has found a
match for a keyword.
Using finite state machines in pattern matching appli-
cations is not new [4, 8, 17], but their use seems to be
frequently shunned by programmers. Part of the reason
for this reluctance on the part of programmers may be
due to the complexity of programming the conventional
algorithms for constructing finite automata from regular
expressions [3, 10, 15], particularly if state minimization
techniques are needed [2, 14]. This paper shows that an
efficient finite state pattern matching machine can be
constructed quickly and simply from a restricted class of
regular expressions, namely those consisting of finite sets
of keywords. Our approach combines the ideas in the
Knuth-Morris-Pratt algorithm [13] with those of finite
state machines.
Perhaps the most interesting aspect of this paper is
the amount of improvement the finite state algorithm
gives over more conventional approaches. We used the
finite state pattern matching algorithm in a library biblio-
graphic search program. The purpose of the program is
to allow a bibliographer to find in a citation index all titles
satisfying some Boolean function of keywords and
phrases. The search program was first implemented with
a straightforward string matching algorithm. Replacing
this algorithm with the finite state approach resulted in a
program whose running time was a fifth to a tenth of the
original program on typical inputs.
2. A Pattern Matching Machine
This section describes a finite state string pattern
matching machine that locates keywords in a text string.
The next section describes the algorithms to construct
such a machine from a given finite set of keywords.
In this paper a
string
is simply a finite sequence of
symbols. Let K = {Yl,Y2 .....
Yk}
be a finite set of
strings which we shall call
keywords
and let x be an arbi-
trary string which we shall call the
text string.
Our prob-
lem is to locate and identify all substrings of x which are
keywords in K. Substrings may overlap with one another.
A pattern matching machine for K is a program which
takes as input the text string x and produces as output
the locations in x at which keywords of K appear as sub-
strings. The pattern matching machine consists of a set
of
states.
Each state is represented by a number. The
machine processes the text string x by successively read-
ing the symbols in x, making state transitions and occa-
Communications June 1975
of Volume 18
the ACM Number 6
mostovoi1234
- 粉丝: 31
- 资源: 240
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1