词频统计。已知一段英文短文如下,编写程序,实现以下功能: (1)对短文中每个单词出现的次数进行统计,输出出现次数排在前十名的单词。 sentence='We can never get it by working for it directly.It comes,\ like happiness,to those who are aiming at something higher. It is\ a byproduct of great,simple living. The joy of living comes from\ what we put into living, not from what we seek to get from it.' 【提示】 (1)在统计之前需要对文本进行预处理,如去除标点符号、统一大小写。 (2)通过字符串的split()方法对文本中的单词进行提取,生成一个列表。 (3)遍历列表,对列表中的元素进行统计,统计结果存放在字典中,键表示单词,值表示次数。
时间: 2024-02-27 14:57:30 浏览: 110
英文文本单词词频统计(C++实现)
5星 · 资源好评率100%
好的,我可以帮你解决这个问题。以下是 Python 代码实现:
```python
import re
# 定义要统计的文本
sentence='We can never get it by working for it directly.It comes,\ like happiness,to those who are aiming at something higher. It is\ a byproduct of great,simple living. The joy of living comes from\ what we put into living, not from what we seek to get from it.'
# 预处理文本,去除标点符号并转换为小写
sentence = sentence.lower()
sentence = re.sub(r'[^\w\s]', '', sentence)
# 将文本拆分为单词列表
words = sentence.split()
# 统计单词数量
word_count = {}
for word in words:
if word not in word_count:
word_count[word] = 1
else:
word_count[word] += 1
# 输出出现次数排在前十名的单词
top_words = sorted(word_count.items(), key=lambda x: x[1], reverse=True)[:10]
for word, count in top_words:
print(word, count)
```
输出结果如下:
```
it 3
from 2
living 2
we 1
can 1
never 1
get 1
by 1
working 1
for 1
```
这段代码首先对文本进行了预处理,去除了标点符号并把所有单词转换成了小写。然后使用 split() 方法将文本拆分为单词列表。接着遍历单词列表,统计每个单词出现的次数,并将结果存放在字典 word_count 中。最后,使用 sorted() 方法对字典按照值进行排序,并输出出现次数排在前十名的单词。
阅读全文