Python统计每个单词出现的次数并降序输出
时间: 2023-12-19 22:07:32 浏览: 86
可以使用Python中的collections模块来实现单词计数功能,具体操作步骤如下:
1. 将文本内容读取并存储到一个字符串变量中;
2. 对字符串进行处理,去除标点符号、转换为小写等操作;
3. 使用split()方法将字符串分割成单词列表;
4. 使用collections模块中的Counter类进行单词计数,并将结果存储到一个字典中;
5. 对字典进行按值降序排序,并输出结果。
下面是代码示例:
```python
import string
from collections import Counter
# 读取文本内容
with open('text.txt', 'r') as f:
content = f.read()
# 处理字符串,去除标点符号、转换为小写
content = content.translate(str.maketrans('', '', string.punctuation)).lower()
# 分割字符串成单词列表
words = content.split()
# 使用Counter进行单词计数
word_count = Counter(words)
# 按值降序排序,并输出结果
for word, count in sorted(word_count.items(), key=lambda x: x[1], reverse=True):
print(word, count)
```
注意,这里的text.txt是你需要统计的文本文件名,可以根据实际情况进行修改。
相关问题
python读入英文文本文件统计单词出现的次数,按降序输出次数前10的单词
好的,可以通过Python的字典和正则表达式来实现。以下是代码示例:
```python
import re
# 打开文件并读取内容
with open("file.txt", "r") as file:
text = file.read()
# 使用正则表达式去除标点符号和换行符
text = re.sub(r'[^\w\s]', ' ', text)
text = re.sub(r'\n', ' ', text)
# 将文本分割成单词列表
words = text.split()
# 统计每个单词出现的次数
word_counts = {}
for word in words:
if word not in word_counts:
word_counts[word] = 0
word_counts[word] += 1
# 按照单词出现的次数降序排序并输出前10个单词
sorted_word_counts = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)
for word, count in sorted_word_counts[:10]:
print(f"{word}: {count}")
```
在这个示例中,我们首先打开并读取文本文件。然后使用正则表达式去除标点符号和换行符,并将文本分割成单词列表。接下来,我们使用一个字典来统计每个单词出现的次数。最后,我们按照单词出现的次数降序排序,并输出前10个单词及其出现的次数。
词频统计:统计python之禅中每个单词出现的次数,并按次数进行降序排列输出。
在Python中,你可以使用`collections`模块的`Counter`类来进行词频统计。首先,需要将文本拆分成单词列表,然后创建一个`Counter`对象来计算每个单词的频率,最后按照频率排序并输出结果。这里是一个简单的步骤示例:
```python
import re
from collections import Counter
# Python之禅内容
zen_of_python = """
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
"""
# 使用正则表达式分割成单词列表
words = re.findall(r'\b\w+\b', zen_of_python.lower())
# 使用Counter计算词频并降序排序
word_counts = Counter(words).most_common()
# 输出结果
for word, count in word_counts:
print(f"{word}: {count}")
```
阅读全文