敏感词过滤算法python
时间: 2024-01-08 17:20:34 浏览: 123
以下是两种常见的敏感词过滤算法的Python实现方法:
1. 替换方法:
```python
def sensitive_word_filter(text, sensitive_words):
for word in sensitive_words:
text = text.replace(word, '*' * len(word))
return text
sensitive_words = ['敏感词1', '敏感词2', '敏感词3']
text = '这是一段包含敏感词的文本,敏感词1和敏感词2都需要被过滤。'
filtered_text = sensitive_word_filter(text, sensitive_words)
print(filtered_text) # 输出:这是一段包含***的文本,***和***都需要被过滤。
```
2. DFA算法:
```python
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class DFAFilter:
def __init__(self):
self.root = TrieNode()
def add_word(self, word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def filter(self, text):
result = []
start = 0
while start < len(text):
node = self.root
end = start
while end < len(text) and text[end] in node.children:
node = node.children[text[end]]
if node.is_end:
result.append('*' * (end - start + 1))
start = end + 1
break
end += 1
else:
result.append(text[start])
start += 1
return ''.join(result)
filter = DFAFilter()
filter.add_word('敏感词1')
filter.add_word('敏感词2')
filter.add_word('敏感词3')
text = '这是一段包含敏感词的文本,敏感词1和敏感词2都需要被过滤。'
filtered_text = filter.filter(text)
print(filtered_text) # 输出:这是一段包含***的文本,***和***都需要被过滤。
```
阅读全文
相关推荐
















