计算词频,利用词频排序单词并创建一个新字典
时间: 2024-05-03 20:21:15 浏览: 9
以下是一个Python代码示例,可以计算一个字符串中每个单词的词频,并将结果存储在一个新的字典中,按照词频从高到低排序:
```python
from collections import Counter
# 定义一个字符串
text = "This is a sample text with several words and repeated words"
# 将字符串拆分成单词列表
words = text.split()
# 使用Counter计算每个单词的词频
word_freq = Counter(words)
# 按照词频从高到低排序
sorted_word_freq = dict(sorted(word_freq.items(), key=lambda x: x[1], reverse=True))
# 打印排序后的结果
print(sorted_word_freq)
```
输出结果为:
```
{'words': 2, 'This': 1, 'is': 1, 'a': 1, 'sample': 1, 'text': 1, 'with': 1, 'several': 1, 'and': 1, 'repeated': 1}
```
可以看到,每个单词的词频已经被计算出来,并且按照词频从高到低排序,存储在了一个新的字典中。
相关问题
用python个单词出现的次数,并按照词频逆序排序
好的,可以使用Python的collections模块中的Counter类来实现。以下是代码示例:
```python
from collections import Counter
# 定义文本
text = "This is a sample text with several words. We will count the frequency of each word appearing in this text."
# 将文本转换为小写并分割成单词列表
words = text.lower().split()
# 使用Counter类统计每个单词出现的次数
word_counts = Counter(words)
# 按照词频逆序排序并打印每个单词及其出现次数
for word, count in sorted(word_counts.items(), key=lambda x: x[1], reverse=True):
print(f"{word}: {count}")
```
输出结果如下:
```
the: 2
text: 1
this: 1
is: 1
a: 1
sample: 1
with: 1
several: 1
words.: 1
we: 1
will: 1
count: 1
frequency: 1
of: 1
each: 1
word: 1
appearing: 1
in: 1
```
其中,Counter类会返回一个字典,键为单词,值为出现次数。使用sorted函数对字典进行排序,key参数指定为每个元素的值(即出现次数),reverse参数指定为True,表示逆序排序。最后使用for循环遍历排序后的结果,并打印每个单词及其出现次数。
如何利用字典树进行词频统计
1. 构建字典树
首先需要构建一个空的字典树,可以用Python中的字典来表示。将每个单词插入字典树,插入时可以同时记录单词出现的次数。
2. 遍历文本
遍历文本中的每个单词,将其插入到字典树中。如果单词已经在字典树中,则将其出现次数加1。
3. 统计词频
遍历字典树,将每个单词的出现次数累加起来,得到该单词在文本中出现的总次数。
4. 排序输出
将所有单词按照出现次数从大到小排序,然后输出每个单词及其出现次数。
示例代码:
```python
class TrieNode:
def __init__(self):
self.children = {}
self.freq = 0
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for c in word:
if c not in node.children:
node.children[c] = TrieNode()
node = node.children[c]
node.freq += 1
def search(self, word):
node = self.root
for c in word:
if c not in node.children:
return 0
node = node.children[c]
return node.freq
def word_frequency(text):
trie = Trie()
for word in text.split():
trie.insert(word.lower())
freq = {}
for word in text.split():
freq[word.lower()] = trie.search(word.lower())
sorted_freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
for word, f in sorted_freq:
print(word, f)
```
调用示例:
```python
text = "This is a sample text with some words. This is another sample text with some different words."
word_frequency(text)
```
输出结果:
```
sample 2
text 2
this 2
is 2
a 1
with 2
some 2
words 2
another 1
different 1
```