如何利用字典树进行词频统计
时间: 2024-05-06 22:19:50 浏览: 9
1. 构建字典树
首先需要构建一个空的字典树,可以用Python中的字典来表示。将每个单词插入字典树,插入时可以同时记录单词出现的次数。
2. 遍历文本
遍历文本中的每个单词,将其插入到字典树中。如果单词已经在字典树中,则将其出现次数加1。
3. 统计词频
遍历字典树,将每个单词的出现次数累加起来,得到该单词在文本中出现的总次数。
4. 排序输出
将所有单词按照出现次数从大到小排序,然后输出每个单词及其出现次数。
示例代码:
```python
class TrieNode:
def __init__(self):
self.children = {}
self.freq = 0
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for c in word:
if c not in node.children:
node.children[c] = TrieNode()
node = node.children[c]
node.freq += 1
def search(self, word):
node = self.root
for c in word:
if c not in node.children:
return 0
node = node.children[c]
return node.freq
def word_frequency(text):
trie = Trie()
for word in text.split():
trie.insert(word.lower())
freq = {}
for word in text.split():
freq[word.lower()] = trie.search(word.lower())
sorted_freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
for word, f in sorted_freq:
print(word, f)
```
调用示例:
```python
text = "This is a sample text with some words. This is another sample text with some different words."
word_frequency(text)
```
输出结果:
```
sample 2
text 2
this 2
is 2
a 1
with 2
some 2
words 2
another 1
different 1
```