Aho-Corasick算法
时间: 2024-06-14 14:07:41 浏览: 210
基于 Double Array Trie 的 Aho Corasick算法的 非常快的C# 实现_代码_下载
Aho-Corasick算法是一种高效的字符串匹配算法,它可以在一次扫描文本的过程中同时查找多个模式串。该算法通过将模式串预处理为确定有限状态自动机,从而实现快速的匹配过程。与其他算法相比,Aho-Corasick算法的时间复杂度为O(n),与模式串的数量和长度无关。
以下是Aho-Corasick算法的一种Java实现示例:
```java
import java.util.*;
class AhoCorasick {
static class Node {
Map<Character, Node> children;
Node fail;
List<String> outputs;
Node() {
children = new HashMap<>();
fail = null;
outputs = new ArrayList<>();
}
}
static void buildTrie(Node root, List<String> patterns) {
for (String pattern : patterns) {
Node curr = root;
for (char c : pattern.toCharArray()) {
curr.children.putIfAbsent(c, new Node());
curr = curr.children.get(c); }
curr.outputs.add(pattern);
}
}
static void buildFailure(Node root) {
Queue<Node> queue = new LinkedList<>();
for (Node child : root.children.values()) {
child.fail = root;
queue.add(child);
}
while (!queue.isEmpty()) {
Node curr = queue.poll();
for (Map.Entry<Character, Node> entry : curr.children.entrySet()) {
char c = entry.getKey(); Node child = entry.getValue();
queue.add(child);
Node failNode = curr.fail;
while (failNode != null && !failNode.children.containsKey(c)) {
failNode = failNode.fail;
}
child.fail = failNode != null ? failNode.children.get(c) : root;
child.outputs.addAll(child.fail.outputs);
}
}
}
static List<String> search(Node root, String text) {
List<String> matches = new ArrayList<>();
Node curr = root;
for (char c : text.toCharArray()) {
while (curr != null && !curr.children.containsKey(c)) {
curr = curr.fail;
}
curr = curr != null ? curr.children.get(c) : root;
for (String output : curr.outputs) {
matches.add(output);
}
}
return matches;
}
public static void main(String[] args) {
List<String> patterns = Arrays.asList("he", "she", "his", "hers");
String text = "ushers";
Node root = new Node();
buildTrie(root, patterns);
buildFailure(root);
List<String> matches = search(root, text);
System.out.println(matches); // 输出:[she, he, hers]
}
}
```
阅读全文