统计三国演义(下卷)前十的高频词,含出现次数。2、根据上题结果,绘制高频词出现次数的条形图。 3、生成三国演义(下卷)词云图请将上述实验结果
时间: 2023-12-23 14:03:01 浏览: 206
好的,以下是 Python 代码实现:
```python
import jieba
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 读取文本文件
with open("sgyy.txt", "r", encoding="utf-8") as f:
text = f.read()
# 使用 jieba 分词
words = jieba.lcut(text)
# 统计词频
word_count = Counter(words)
# 去除无用词汇
stopwords = [",", "。", "了", "的", "和", "是", "在", "将军", "丞相", "不", "有", "为", "之", "以", "而", "其", "也", "则", "与", "等", "此", "彼", "乃", "何", "故", "乎", "哉", "一", "却", "不得", "不可", "不如", "不能", "不是", "不能不", "不要", "不知", "不见", "不说", "不敢", "不足", "不过", "之一", "之二", "之三", "之四", "之五", "之六", "之七", "之八", "之九", "之十", "十分", "什么", "这个", "那个", "如此", "但是", "虽然", "就是", "这样", "那样", "因为", "所以", "然后", "因此", "只是", "已经", "还是", "并且", "不过", "于是", "就像", "起来"]
for w in stopwords:
del word_count[w]
# 统计前十高频词
top_words = word_count.most_common(10)
print("前十高频词:", top_words)
# 绘制条形图
plt.bar(range(len(top_words)), [w[1] for w in top_words], tick_label=[w[0] for w in top_words])
plt.xlabel("词汇")
plt.ylabel("出现次数")
plt.title("三国演义(下卷)前十高频词")
plt.show()
# 生成词云图
wordcloud = WordCloud(font_path="msyh.ttc", background_color="white").generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
```
这里假设三国演义的文本文件名为 `sgyy.txt`,代码中使用 jieba 分词库进行分词,并使用 Counter 类统计词频。为了避免统计到无用词汇,我们在统计前先去除了一些停用词。最后使用 Matplotlib 绘制条形图和 WordCloud 库生成词云图。
注意:在运行代码前需要先安装 jieba、wordcloud 和 matplotlib 库。
阅读全文