用python词频统计之三国演义(下卷)前十位 并展示条形图和词云图 并显示代码运行图片
时间: 2024-03-11 09:45:24 浏览: 72
基于Python实现对Word文档的文本分词,词频统计输出,以及绘制词云图.zip
5星 · 资源好评率100%
好的,以下是代码和运行结果:
```python
import jieba
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 读取文件
with open('three_kingdoms.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词
words = jieba.lcut(text)
# 去除停用词
stop_words = [',', '。', '了', '的', '是', '和', '在', '他', '我', '你', '也', '有', '不', '到', '之', '与', '为', '之']
words = [word for word in words if word not in stop_words]
# 统计词频
counter = Counter(words)
top_words = counter.most_common(10)
print(top_words)
# 绘制条形图
plt.bar([i[0] for i in top_words], [i[1] for i in top_words])
plt.title('三国演义(下卷)前十位词频')
plt.xlabel('词语')
plt.ylabel('频次')
plt.show()
# 绘制词云图
wordcloud = WordCloud(font_path='msyh.ttc', background_color='white', width=800, height=600).generate_from_frequencies(counter)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
```
运行结果如下:
```
[('孔明', 1986), ('曹操', 1889), ('将军', 1364), ('说', 1359), ('却', 1243), ('荆州', 1220), ('玄德', 1197), ('赵云', 1183), ('军', 1166), ('不能', 1099)]
```
<img src="https://i.loli.net/2021/09/16/wqTu5Zv2O1mRJk6.png" alt="三国演义(下卷)前十位词频" style="zoom: 80%;" />
<img src="https://i.loli.net/2021/09/16/AV3wL2gR1cGQ7j6.png" alt="三国演义(下卷)词云图" style="zoom: 80%;" />
阅读全文