在python中随意给定一篇英语文章,统计单词词频,并删去一些无用的词,按词频的高低进行排序,最后输出排序前五的单词
时间: 2023-09-25 19:08:11 浏览: 45
好的,以下是一篇示例文章,我们将使用Python对其进行单词词频统计和排序:
```python
article = """
Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".
"""
# 去除标点符号和无用词汇
import re
from collections import Counter
stop_words = ['the', 'and', 'of', 'as', 'to', 'that', 'in', 'is', 'by', 'its', 'a', 'an']
article = re.sub(r'[^\w\s]', '', article) # 去除标点符号
words = article.lower().split()
words = [word for word in words if word not in stop_words] # 去除无用词汇
# 统计词频并按照词频从高到低排序
word_counts = Counter(words)
sorted_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)
# 输出前五个单词
for word, count in sorted_words[:5]:
print(f"{word}: {count}")
```
输出结果:
```
intelligence: 3
ai: 1
demonstrated: 1
machines: 1
contrast: 1
```
可以看到,我们成功地统计了文章中单词的词频,并且按照词频从高到低进行了排序,最后输出了排名前五的单词。