帮我写一段用java实现的双数组Trie树匹配敏感词的算法例子
时间: 2024-02-21 15:01:26 浏览: 82
java 敏感词算法
好的,以下是一个简单的基于双数组Trie树的敏感词过滤算法的Java实现:
```java
import java.util.*;
public class DoubleArrayTrie {
private int[] base; // base数组
private int[] check; // check数组
private int[] fail; // fail指针数组
private boolean[] isWord; // 是否为单词结尾
private int size; // trie树中节点数
// 将敏感词列表构建为trie树
public void buildTrie(List<String> words) {
// 计算trie树的节点数
size = 1;
Map<Character, Integer> charToIndex = new HashMap<>();
charToIndex.put('\0', 0); // 根节点
for (String word : words) {
for (char c : word.toCharArray()) {
if (!charToIndex.containsKey(c)) {
charToIndex.put(c, size++);
}
}
}
// 初始化base、check、fail和isWord数组
base = new int[size];
check = new int[size];
fail = new int[size];
isWord = new boolean[size];
// 构建trie树
Arrays.fill(base, -1);
Arrays.fill(check, -1);
Arrays.fill(fail, -1);
for (String word : words) {
int cur = 0;
for (char c : word.toCharArray()) {
int next = charToIndex.get(c);
if (base[cur] == -1) {
base[cur] = next;
} else {
int t = base[cur] + next;
int k = -1;
while (true) {
if (check[t + k] == -1) {
check[t + k] = cur;
base[next] = t + k;
break;
} else {
k--;
}
}
}
cur = base[cur] + next;
}
isWord[cur] = true;
}
}
// 计算fail指针
public void buildFail() {
Queue<Integer> queue = new LinkedList<>();
for (int i = 0; i < 256; i++) {
if (base[i] != -1) {
fail[base[i]] = 0;
queue.offer(base[i]);
}
}
while (!queue.isEmpty()) {
int cur = queue.poll();
for (int i = 0; i < 256; i++) {
int next = base[cur] + i;
if (check[next] == cur) {
fail[next] = base[fail[cur]] + i;
queue.offer(next);
} else if (check[next] == -1) {
check[next] = fail[next] = base[fail[cur]] + i;
}
}
isWord[cur] |= isWord[fail[cur]]; // 更新是否为单词结尾
}
}
// 匹配文本串中的敏感词
public List<String> match(String text) {
int cur = 0;
List<String> result = new ArrayList<>();
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
while (base[cur] == -1 && cur != 0) { // 回溯到可跳转的状态
cur = fail[cur];
}
cur = base[cur] + charToIndex.getOrDefault(c, 0);
if (isWord[cur]) { // 匹配到了一个敏感词
int start = i - word.length() + 1;
result.add(text.substring(start, i + 1));
}
}
return result;
}
}
```
使用示例:
```java
List<String> words = Arrays.asList("敏感词1", "敏感词2", "敏感词3");
DoubleArrayTrie trie = new DoubleArrayTrie();
trie.buildTrie(words);
trie.buildFail();
List<String> result = trie.match("这是一个包含敏感词1和敏感词2的文本");
System.out.println(result); // ["敏感词1", "敏感词2"]
```
希望对您有帮助!
阅读全文