如何用python对一篇中文文章进行词频统计并绘制词云图。
时间: 2023-12-03 09:44:14 浏览: 248
python使用tkinter库实现自定义的词云图和top10词频统计
要对一篇中文文章进行词频统计并绘制词云图,可以使用Python中的jieba和wordcloud库。步骤如下:
1. 安装jieba和wordcloud库。在命令行中输入以下命令:
```
pip install jieba
pip install wordcloud
```
2. 导入所需库:
```python
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
```
3. 打开中文文章并读取:
```python
with open('article.txt', 'r', encoding='utf-8') as f:
article = f.read()
```
4. 使用jieba分词,并统计词频:
```python
words = jieba.lcut(article) # 使用精确模式分词
freq = {}
for word in words:
if len(word) > 1: # 只统计长度大于1的词语
freq[word] = freq.get(word, 0) + 1
```
5. 生成词云图:
```python
mask = np.array(Image.open('mask.png')) # 打开遮罩图片
wc = WordCloud(background_color='white', font_path='msyh.ttc', mask=mask, max_words=200, max_font_size=100, random_state=42)
wc.generate_from_frequencies(freq) # 根据词频生成词云图
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()
```
其中,mask参数指定了词云图的形状,max_words和max_font_size分别指定了最多显示的词数和词的最大字号,font_path指定了使用的字体。
完整代码如下:
```python
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
with open('article.txt', 'r', encoding='utf-8') as f:
article = f.read()
words = jieba.lcut(article)
freq = {}
for word in words:
if len(word) > 1:
freq[word] = freq.get(word, 0) + 1
mask = np.array(Image.open('mask.png'))
wc = WordCloud(background_color='white', font_path='msyh.ttc', mask=mask, max_words=200, max_font_size=100, random_state=42)
wc.generate_from_frequencies(freq)
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()
```
其中,article.txt是要处理的文章,mask.png是词云图的形状图片,msyh.ttc是字体文件。
阅读全文