用python 打出西游记人物出场次数,并生成词云库,
时间: 2023-07-04 15:18:09 浏览: 228
好的,下面是Python实现方法:
首先,我们需要读取西游记的文本文件,并使用jieba分词库对其进行分词,然后统计人物出现的次数。
```python
import jieba
from collections import Counter
import wordcloud
import matplotlib.pyplot as plt
# 读取文本文件
with open('西游记.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 使用jieba分词进行分词
words = jieba.lcut(text)
# 统计人物出现的次数
# 人物名单可以根据需求自行添加或删除
counts = Counter()
for i in range(len(words) - 1):
if words[i] == '孙悟空' or words[i] == '悟空' or words[i] == '行者':
if words[i+1] == '大圣' or words[i+1] == '孙行者' or words[i+1] == '齐天大圣':
counts['孙悟空'] += 1
elif words[i] == '唐僧' or words[i] == '师父':
if words[i+1] == '三藏' or words[i+1] == '玄奘':
counts['唐僧'] += 1
elif words[i] == '沙僧' or words[i] == '悟净' or words[i] == '悟能':
if words[i+1] == '沙和尚' or words[i+1] == '悟净' or words[i+1] == '悟能':
counts['沙僧'] += 1
elif words[i] == '猪八戒' or words[i] == '八戒' or words[i] == '猪悟能':
if words[i+1] == '天蓬元帅' or words[i+1] == '猪悟能':
counts['猪八戒'] += 1
elif words[i] == '观音' or words[i] == '菩萨':
if words[i+1] == '观世音菩萨' or words[i+1] == '菩萨':
counts['观音'] += 1
# 输出出现次数前5的人物名和次数
top_counts = counts.most_common(5)
for name, count in top_counts:
print(name, count)
```
接下来,我们需要生成词云图。使用wordcloud库可以轻松生成词云图。
```python
# 生成词云图
wc = wordcloud.WordCloud(width=800, height=600, background_color='white')
wc.generate_from_frequencies(counts)
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()
```
完整代码如下:
```python
import jieba
from collections import Counter
import wordcloud
import matplotlib.pyplot as plt
# 读取文本文件
with open('西游记.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 使用jieba分词进行分词
words = jieba.lcut(text)
# 统计人物出现的次数
# 人物名单可以根据需求自行添加或删除
counts = Counter()
for i in range(len(words) - 1):
if words[i] == '孙悟空' or words[i] == '悟空' or words[i] == '行者':
if words[i+1] == '大圣' or words[i+1] == '孙行者' or words[i+1] == '齐天大圣':
counts['孙悟空'] += 1
elif words[i] == '唐僧' or words[i] == '师父':
if words[i+1] == '三藏' or words[i+1] == '玄奘':
counts['唐僧'] += 1
elif words[i] == '沙僧' or words[i] == '悟净' or words[i] == '悟能':
if words[i+1] == '沙和尚' or words[i+1] == '悟净' or words[i+1] == '悟能':
counts['沙僧'] += 1
elif words[i] == '猪八戒' or words[i] == '八戒' or words[i] == '猪悟能':
if words[i+1] == '天蓬元帅' or words[i+1] == '猪悟能':
counts['猪八戒'] += 1
elif words[i] == '观音' or words[i] == '菩萨':
if words[i+1] == '观世音菩萨' or words[i+1] == '菩萨':
counts['观音'] += 1
# 输出出现次数前5的人物名和次数
top_counts = counts.most_common(5)
for name, count in top_counts:
print(name, count)
# 生成词云图
wc = wordcloud.WordCloud(width=800, height=600, background_color='white')
wc.generate_from_frequencies(counts)
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()
```
注意事项:
- 代码中的人物名单可以根据需求自行添加或删除。
- 为了避免词云图中出现单个汉字,可以使用WordCloud库的stopwords参数去掉停用词,例如:`wc = wordcloud.WordCloud(width=800, height=600, background_color='white', stopwords={'的', '了', '是', '在', '和', '也', '就', '等'})`。
- 由于西游记文本比较大,分析时间可能会比较长,建议在测试时使用较小的文本文件。
阅读全文