首页用python jieba对文章进行词频统计并绘制词云图

用python jieba对文章进行词频统计并绘制词云图

时间: 2023-07-04 11:07:58 浏览: 91

好的，以下是用 Python 中的 jieba 库进行词频统计和绘制词云图的步骤： 1. 安装 jieba 库和 wordcloud 库： ``` pip install jieba wordcloud ``` 2. 导入需要使用的库： ```python import jieba from collections import Counter from wordcloud import WordCloud import matplotlib.pyplot as plt ``` 3. 读取文本文件并进行分词： ```python with open('article.txt', 'r', encoding='utf-8') as f: text = f.read() words = jieba.cut(text) ``` 4. 统计词频： ```python counter = Counter(words) # 取出出现次数最多的前 50 个词 freq_dict = dict(counter.most_common(50)) ``` 5. 绘制词云图： ```python # 创建词云对象 wc = WordCloud(font_path='msyh.ttc', background_color='white', width=800, height=600) # 生成词云 wc.generate_from_frequencies(freq_dict) # 显示词云图 plt.imshow(wc) plt.axis('off') plt.show() ``` 完整代码如下： ```python import jieba from collections import Counter from wordcloud import WordCloud import matplotlib.pyplot as plt # 读取文本文件并进行分词 with open('article.txt', 'r', encoding='utf-8') as f: text = f.read() words = jieba.cut(text) # 统计词频 counter = Counter(words) # 取出出现次数最多的前 50 个词 freq_dict = dict(counter.most_common(50)) # 绘制词云图 wc = WordCloud(font_path='msyh.ttc', background_color='white', width=800, height=600) wc.generate_from_frequencies(freq_dict) plt.imshow(wc) plt.axis('off') plt.show() ``` 需要注意的是，上述代码中的 `article.txt` 应该替换成你需要处理的文本文件的路径。此外，`msyh.ttc` 是微软雅黑字体的文件路径，如果没有该字体文件，也可以使用其他中文字体。

阅读全文